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Abstract 

Likelihood Methods for the Detection 
and Characterization of Gamma-ray Pulsars 
with the Fermi Large Area Telescope 

Matthew Kerr 

Chair of the Supervisory Committee: 
Professor Thompson Burnett 
Physics 

The sensitivity of the Large Area Telescope (LAT) aboard the Fermi Gamma-ray Space 
Telescope allows detection of thousands of new 7-ray sources and detailed characterization of 
the spectra and variability of bright sources. Unsurprisingly, this increased capability leads 
to increased complexity in data analysis. Likelihood methods arc ideal for connecting models 
with data, but the computational cost of folding the model input through the multi-scale 
instrument response function is appreciable. Both interactive analysis and large projects — 
such as analysis of the full gamma-ray sky — can be prohibitive or impossible, reducing the 
scope of the science possible with the LAT. 

To improve on this situation, we have developed pointlike, a software package for fast 
maximum likelihood analysis of Fermi-LAT data. It is interactive by design and its rapid 
evaluation of the likelihood facilitates exploratory and large-scale, all-sky analysis. We 
detail its implementation and validate its performance on simulated data. We demonstrate 
its capability for interactive analysis and present several all-sky analyses. These include a 
search for new 7-ray sources and the selection of LAT sources with pulsar-like characteristics 
for targeted radio pulsation searches. We conclude by developing sensitive periodicity tests 
incorporating spectral information obtained from pointlike. 
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2.1 The probability that the first interaction of an incident photon in the given 
material is pair production. By 100 MeV, the cross section for photon inter- 
action in heavy elements is dominated by pair production. Reproduced from 

m 

2.2 The photon cross section in carbon (Z=6) and lead (Z=82). The greater 
binding energy of atomic electrons in lead allows photoelectric resonances to 
X-ray energies, while Compton scattering is the dominant process in X-rays 
for carbon. In both materials, pair production on nuclei becomes dominant 
at high energies, where the cross section is essentially flat. The absolute value 
of the cross section is significantly higher for lead, as a (x Z"^. Reproduced 
from [87] 

2.3 The canonical representation of a pulsar magnetosphere, reproduced from 
|45j . The magnetic moment of the neutron star (NS), ^, is inclined to the NS 
spin axis, fi, by an angle a. The field is assumed to be dominantly dipolar, 
though higher multipoles may be important [88j. The Goldreich-Julian equi- 
librium charge density [52] pcj vanishes along the null charge surface. The 
small patch on the NS surface at the foot of the open B field lines is the 
polar cap. The slot gap (dashed lines) and outer gap (hatching) lie along the 
last closed (open) field line. HE emission is beamed along B field lines, and 
the observer's inclination C determines which field lines are visible, strongly 
affecting the resulting light curve 



3.1 The Fermi-LAT point-spread function as characterized by "r68" and "r95", 
the angular separation from a point source within which 68% and 95% of 
photons from the source lie. The description here is for photons arriving close 
to the instrument's boresight; the angular resolution drops by up to 50% at 
the edge of the field of view (see Figure 14.1.2^ . As discussed in Chapter [21 
the energy-dependence originates from multiple scattering of pair-produced 
electrons and positrons in the tungsten foils, and the asymptote is set by the 
pitch of the silicon strips in the tracking layers 
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3.2 The distribution in azimuth for a set of reconstructed events (P6 "diffuse" 
class) with 100 MeV < E < 100 GeV in a 10° aperture centered on (l,b) = (0,- 
45), integrated from Aug. 4 2008 to Jul. 18 2010. Events with a reconstructed 
zenith angle > 105° have been discarded. (See Figure 14.1.21 which charts 
the zenith angle for this same data.) The Fermi-L AT has twofold bilateral 
symmetry, so we can obtain the "unique" azimuth angles, i.e., those that truly 
probe the dependence on the shape of the detector, by 4>unique = </>mod90° 
and, further, reflecting events with (^unique 

> 45° to 90° - 45°. The red 
horizontal line indicates the mean, while the blue gives the reconstructed 
unique azimuth in 1° bins. The deviation from the mean is quite modest. . . 

3.3 The effective area for Fermi-LAT as described by the P6-v3-diff IRE. The 
lefthand column shows the effective area of the front section of the detector 
(see ^2.2.ip and the righthand column the back. The top row shows the joint 
dependence of the effective area on energy and incidence angle. The second 
row shows the dependence on energy for a set of incidence angles, while the 
third row shows the dependence on incidence angle for a set of energies. We 
note that effective area is essentially zero below 100 MeV, turns on rapidly, 
and becomes essentially flat at 1 GeV. This behavior is governed by the 
difficulty in reconstructing "soft" photons 

3.4 Monte Carlo data from a detector simulation for a bin centered on 1334 MeV 
for front-converting events. The events have been summed over incidence 
angle up to 66.4°. The behavior of the density function over many orders of 
magnitude is well-described by a single King function, i.e., that data show a 
sharp core and power law tails. However, there is a slight break in the slope 
of the power law tail, and accordingly the King function cannot provide a 
perfect fit, leaving residuals > 10%. In later versions of the IRE, this issue is 
ameliorated by the use of two King functions 

3.5 

4.1 The compression ratio of the data, i.e., the mean photons per pixel, for a 
given energy band. The occupancy drops rapidly as the energy approaches 1 
GeV due to the improving PSE and falling spectra of astrophysical sources. . 

4.2 The PSE — characterized by the radius in which 68% events are enclosed — as 
a function of incidence angle, energy (in GeV), and conversion type ("CT" in 
legend). Generally, the angular resolution is best on-axis and degrades as the 
edge of the field-of-view is approached. Using a larger containment radius, 
e.g., 95%, as a metric yields nearly the same dependence on incidence angle 
and magnitude of shift. The standard cut on reconstructed incidence angle 
for pointlike of cos(^) = 0.4, or > 66.4°, is indicated by the vertical black line 
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4.3 Reconstructed events (P6 "diffuse" class) with 100 MeV < E < 100 GeV in 
a 10° aperture centered on (l,b) = (0,-45), integrated from Aug. 4 2008 to 
Jul. 18 2010. The binning is in the reconstructed zenith angle. The effect of 
the PSF is clearly seen in the broadening of the distribution of limb photons 
about « 113° 

4.4 Comparison of the geometric mean (-E^geo = ^ -^max ^ -^min) of an energy 
band with the optimal energy -Eopt as estimated by the log likelihood method 
described in the text. The binning is uniform in logarithmic energy with 8 
bins per decade and separate bands for front- and back-converting events are 
shown. The relative difference of -Eopt from -Egeo for three different power law 
source spectra with photon indices 7 = 1.5, 2.0, 3.0 is shown 

4.5 Two power law spectra {fi{E) oc E"^'^ and f2{E) oc i?~^-°) integrated over 
the most problematic energy band in a typical analysis, 100-133 MeV in 8 bin- 
per-decade mode and 100-178 MeV in 4 bin-per-decade mode. The relative 
difference with the exact integral (as determined by numerical integration to 
machine precision) is shown as a function of the number of intervals used in 
the composite Simpson's rule 

4.6 The exposure variation over a typical ROI (here, located at the position of the 
Vela PSR.) The approximated exposure assumes the same energy dependence 
as that at the ROI center and corrects only for the normalization at the 
geometric mean energy of the band. The exact value accounts for the spatial 
variation at all energies. Relative differences are shown for point sources at 
a modest (5°) and extreme (10°) separation from the ROI center. The band 
chosen, 100 < £'/MeV < 178, has rapidly-varying effective area, making this 
a worst case. The error is < 0.5% 

5.1 The PSF profile for bands with front-converting events between 100 and 1000 
MeV. The x-axis gives the angular separation of the bin center from the point 
source position in degrees and is on a logarithmic scale. The y-axis indicates 
the relative difference, observed less predicted, in percent. These results are 
for a source with a power law spectrum of photon index 2.0 

5.2 As Figure 15.1.11 for bands with front-converting events between 1000 and 
10000 MeV 

5.3 As Figure 15.1.11 for bands with front-converting events between 10000 and 
100000 MeV 
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5.4 Comparison of modeled and observed counts for a point source with a F = 2.0 
spectrum. The histogram gives the model expectation and the data points 
the observed counts. The residuals indicate that the accuracy is better than 
1% at all energies. The minor but definite trend for an excess in the expected 
counts below 1 GeV can lead to a very small bias in the estimation of the 
photon index — see Table 15. 2[ It is in this energy range the effects of PSF 
approximation and "ragged" edge effects are most pronounced 

5.5 Comparison of modeled and observed counts for a point source with a T = 1.5 
spectrum. We include this model for its superior statistics at high energy, 
where it is clear the accuracy remains better than 1% 

5.6 The lefthand panels show the weighted residuals, (A^o6s — ^mod) / V ^mod, for 
the diffuse background {glLiem_v02 + isotropic-iem-v02) at low energy in 
zenithal equal-area (ZEA) projection. The top (bottom) row shows front- 
converting (back-converting) events. The "ragged edge" of the HEALPix 
data compared to the ROI boundary (blue circle) is clear at these low energies, 
particularly for the back events. The righthand panels show a histogram of 
the weighted residuals. There are many counts per pixel for these low energies 
(mean 102 (360) for front (back)), and the residuals should (and do) follow 
a normal distribution (shown in red). The means of the distribution are 
consistent with 0, indicating no statistically significant bias 

5.7 Weighted diffuse residuals at moderate energies. The spatial residuals are 
again featureless. The count rate per pixel is at the limit of normal approx- 
imation (8 (24) for front (back)), and the residuals for the front-converting 
events clearly show a long right tail in keeping with the asymmetric Poisson 
distribution. (The mean is accordingly slightly biased.) The "ragged" edge 
is essentially eliminated by 1 GeV for nearly any reasonable ROI size 

5.8 The predicted and observed counts for the diffuse background [glLiem-v02 
+ isotropic_iem-v02) for all energies and conversion types 

5.9 As Figure 15.1.21 but for a power law source with photon index of 1.5 and 
allowing for energy dispersion in the data. The model value is fixed at the 
Monte Carlo truth, so the residuals indicate how the photons have been 
redistributed from the true values. The intrinsic spectrum of hard sources 
enhances the tendency for high energy photons to redistribute to low energies, 

5.10 As Figure [5. 1.51 for a power law source with photon index of 3.0. Soft sources 
show a decreased curvature at low energies, but the overall loss of photons 
from the passband is increased 
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5.11 Results for T = 2.0 in error-weighted units. The contours give the 68%, 95%, 
and 99% confidence intervals as estimated from a Gaussian distribution using 
the sample covariance matrix. Blue circles (red squares) are the 68% (32%) of 
the sample closest to (farthest from) the mean. The single, black errorbar at 
the center of the contours gives the sample mean, and the length of the error 
bars gives the 1-d sample standard deviations divided by ^/N. The black axis 
marks crossing at the origin represent the simulated ("true") values. Thus, 
a failure of the cross to overlap the origin indicates a systematic bias in the 
parameter estimate. The position of the cross relative to the origin gives the 
magnitude of the bias in "sigma" units. Sources with a test statistic < 10 
are indicated with a cross-filled symbol 

5.12 As Figure [5.2.11 in relative units. The thick dashed line indicates zero flux. 
Absolute outlying values are more apparent at low fluxes 

5.13 The angular deviation in absolute units (°) for the position in the R.A. and 
Decl. directions. The number of sources with successful position fits is indi- 
cated by 

5.14 As Figure [5.2.21 but with error- weighted units 

6.1 The model-independent spectrum for IFGL J2015. 7+3708. At upper left, the 
initial spectrum estimated with all sources fixed at IFGL values. At upper 
right, the spectrum after adding cutoff parameters to PSRs J2021+3651 and 
J2021+4026. At lower left, no external sources have changed but we model 
IFGL J2015. 7+3708 with a log parabola (see text). Finally, at lower right, 
we have added the new source discussed in the previous section 

6.2 Spectra for PSRs J2021+3651 and J2021+4026 

6.3 The integral photon flux of IFGL sources above 300 MeV. The vertical lines 
indicate the flux thresholds in the iterative all-sky fit 

6.4 The log likelihood changes in each ROI for each iteration. The iterations 
proceed from left to right and top to bottom. The precise quantity shown 
is sign(Alog£) x a/ (A log £, and the scale extends from —5 (dark blue) to 
5 (dark red). For these large ROIs, a change in log likelihood of order a 
few is negligible. In the upper lefthand corner — the initial iteration — the log 
likelihood shift is dominated by the initial fit of the parameters for the diffuse 
models. The third through sixth panels show the log likelihood improving 
as more sources are allowed to vary. The brightest sources are concentrated 
in the Galactic plane (selection and projection effects), so only by the fifth 
panel are a significant fraction of sources at all latitudes free. The final three 
panels show the refinement with all point sources varying. In the final panel, 
essentially all ROIs have ceased to improve. Four pixels at the Galactic center 
continue to shift by order a few in an "oscillation" described in the main text 
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6.5 A comparison of the ML estimates obtained by the pointlike ASA and the 
unbinned hkeUhood pipeUne analysis of the IFGL catalog. All sources are 
fit with a power law. The flux density is evaluated at the "pivot energy", 
an estimate of the energy at which the covariance of the photon index and 
flux density is minimized. Sources in red correspond to IFGL sources with a 
"c" appended, indicating they lie along the Galactic ridge and their spectral 
values should be taken with caution. Many of these sources may be spurious. 
A subset of these sources clearly departs from the main population for which 
pointlike and the IFGL values are in excellent agreement 

6.6 In the lefthand panel, the difference in the ML parameter estimates scaled by 
the parameter error estimated by the IFGL analysis. The photon indices are 
in excellent agreement, while the flux densities shows a small trend toward 
underestimating the flux. In the righthand panel, we compare the uncertainty 
estimated by each analysis. This comparison is made by estimating the rel- 
ative error for each method and then taking the difference (IFGL - pointlike 
ASA). The error estimated by pointlike is generally in excellent agreement, 
with some flux error estimates exceeding the IFGL estimate by up to 20%, 
an unimportant difference. It is in any case not surprising that an unbinned 
method may deliver slightly smaller error estimates 

6.7 The best-fit values of the parameters for the two scaling models discussed in 
the text. The null value for the constants is 1, while for the photon index 
of the power law scaling the Galactic diffuse, 0. The abundant photons 
in the Galactic plane strongly constrain the Galactic diffuse model, while 
at high latitudes the sparse photons allow large fluctuations. The isotropic 
background is generally constrained to within 10% independent of position 
on the sky. 

6.8 An image (in Hammer-Aitoff projection of Galactic coordinates) of the GeV 
sky produced by kernel density estimation with the PSF with 18 months of 
data. The units are somewhat arbitrary but can be roughly interpreted as 
counts per steradian. All energies > 100 MeV are included 

6.9 An image (in zenithal equal-area projection of Galactic coordinates) of the 
GeV sky produced by kernel density estimation with the PSF with 18 months 
of data. The image shows the Cygnus region and has overlaid the locations 
of sources in the IFGL catalog. In constructing this image, photon energies 
were restricted to 2200 < E/MeV < 10000 (4400 < E/MeV < 20000) for 
events converting in the front (back) of the detector. Due to the ~ 1/E 
dependence of multiple scattering and the factor of w 2 difference in the 
radiation length of the front and back radiations, the two samples of events 
have approximately the same angular resolution 
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6.10 A TS map constructed of the Cygnus region to demonstrate the source finding 
principle discussed in the text. In the top panel, only the best-fit diffuse model 
(Galactic + isotropic) is used for a background model. All point sources then 
contribute to the calculated TS, and the map is dominated by the exceedingly 
bright pulsars J202 1+4026 and J202 1+3651. The range spanned by the color 
scale is 25-40,000. In the second panel, the two bright pulsars are included in 
the background, and the color scale is now 25-2200, with the combination of 
Cygnus X-3 and PSR J2032+4127 setting the top end. In the bottom panel, 
all IFGL sources are included in the background model, and the scale is now 
25-180. The excess at ^ (74.5,0.5) is a new point source Ill4 

6.11 A gallery of typical plots used for classifying unidentified sources as good pul- 
sar candidates. In the upper left panel, a known LAT PSR J2021.5+4026[3] 
gives an example of a typical pulsar spectrum: flat or rising in the 0.1 — 1 
GeV decade, a strong cutoff, and a low "variability index" . In the upper right 
panel, a much dimmer pulsar candidate (at which location radio pulsations 
were subsequently detected), which shows similar features scaled down by 
two orders of magnitude. At bottom left is the quasar 3C 454. 3pl] exempli- 
fying the features of typical AGN emission: a falling spectrum in the 0.1 — 1 
decade and extreme variability. At bottom right, a comparable (unidentified) 
source showing similar features scaled down by two orders of magnitude in 
flux. It has modest variability and would not be considered a good candidate 



due to this and its falling spectrum at low energy. 111? 



7.1 The observed distribution for the statistic for various sample sizes and 
a collection of maximum harmonics. The asymptotic calibration is shown as 
a solid line, while the solid band gives the empirical distribution function of 
the Monte Carlo realizations. The width indicates the statistical uncertainty 
estimated as y^A^(>= TS), i.e., the square root of the number of counts with 

a TS greater than or equal to the current TS 1291 

7.2 The observed (Monte Carlo) and predicted (asymptotic) distribution for the 
weighted test statistic Z^^. The procedure is identical to that described for 
Figure 17.3.11 save for the addition of weights and the omission of the m = 20 
case for decreased computational burden. The addition of weights is shown 
to make no difference in agreement with the predicted distribution; only 



insufficient sample size can leads to a discrepancy Il31 
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7.3 A comparison of the dependence on event selection of H20W and H2o- A source 
with flux 10^^ ph cm~^ was simulated and a "Vela-like" light curve was 
added to the source photons. The spectrum was that of Eq. 17.211 The test 
statistics were calculated over a grid of selection criteria: the y-axis gives 
the extraction radius (maximum angular distance from the pulsar allowed) 
and the x-axis indicates the threshold energy, the minimum energy allowed 
in the selection. The lefthand (righthand) columns shows the results for 
the weighted (un-weighted) test statistic. The two rows are two indepedent 
Monte Carlo realizations of the source and background. The test statistics 
were converted to a units and independently normalized to the maximum 
observed value. It is clear that the weighted statistics have a minimal depen- 
dence on the extraction criterion and, more importantly, do best when given 
the most data (maximum in upper lefthand corner of cut space). On the 
other hand, the significance can be strongly peaked for the standard statistic 
and the optimal cut varies from realization to realization 

7.4 A comparison of the weighted and un-weighted versions of three statistics. 
Photons with energies above 200 MeV are selected according to the extraction 
radius on the x-axis. The light curve in this case is a Gaussian with a = 0.03, 
so the H test and high-harmonic Z test are much more sensitive than the Z2 
tests. The salient feature is the relative independence of the detection flux 
threshold on extraction radius for the weighted methods. The sensitivity for 
the standard tests drops by about 50% as we increase the aperture size. The 
results indicate that little is gained for weighted methods with an extraction 
radius larger than 2° 

7.5 The dependence of significance in a units as a function of flux. The reported 
value at a given flux is that attained by at least 68% of the ensemble of 50 
MC realizations. The trend is approximately linear, as explained in the main 
text, making it simple to invert the relation. The light curve here is a single 
Gaussian peak with a = 0.03 

7.6 The 68% flux detection threshold based on a 4(T detection criterion for a 
single-peaked Gaussian light curve as a function of duty cycle. Light curves 
with narrow (broad) peaks lie to the left (right). The un-weighted thresholds 
are about twice those of the weighted thresholds, indepedent of duty cycle or 
statistic 

7.7 The templates used for the single- and double-peaked light curves in the deter- 
mination of the detection flux thresholds. The functional form is a wrapped 

Gaussian, i.e., /{(/)) = Yl 5'(</' + with g{(l},fi,a) = {2tt)~^'^ ex.p{—0.5 {(/) — 

i=OD 

fi)'^/a'^). In the first panel, fi = 0.5, while in the second panel, ^1 = 0.25, 
and fi2 = 0.70. In this panel, the ratio of the peak heights is 3/2, and the 
peak widths are identical 
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7.8 The 68% flux detection threshold based on a 4c7 detection criterion for a 
double-peaked Gaussian light curve as a function of duty cycle. The flux 
threshold for the weighted statistic is 1.5 — 2.0 times lower than the un- 
weighted statistics 145l 

7.9 The 68% flux detection threshold based on a 4a detection criterion for a 
single-peaked Gaussian light curve as a function of duty cycle. Here, the 
pulsed fraction has been decreased to 50%, i.e., half of the photons from the 
source are uniformly distributed and half are drawn from the Gaussian light 
curve. The flux threshold for the weighted statistic is 1.5 — 2.0 times lower 
than the unweighted statistics, with some slight dependence on duty cycle 



and statistic Il47 

7.10 The signiflcance for each member of the ensemble described in the text cal- 
culated using weights derived from the model used to simulated the data 
(the Monte Carlo "truth") and derived from the ML model. The ML-derived 



values are very similar to those obtained using the known spectrum 1491 

7.11 A comparison of the signiflcance of pulsed detection versus unpulsed detec- 
tion. The pulsed detection signiflcance was calculated using the ML best-flt 
spectrum with the weighted H test, while the unpulsed signiflcance was de- 
rived from a likelihood ratio test as described in the text. For nearly all 



sources, the pulsations are more strongly detected than the DC emission. . . Il51 
7.12 As Figure 17.5.21 but with weights derived from a ML flt with the pulsar 



spectrum flxed to a power law with F = 2.0 and only the flux allowed to vary. 1521 



A.l We show the survival function (1 — F(H)) of the asymptotic distribution for 
H and for a sample distribution (N = 10^) drawn from the null distribution 
by simulation. To reduce the scale, we divide the survival function for a 
variable with two degrees of freedom, viz. 1 — F{x) = exp(— 0.5j;). For 
each maximum harmonic, the sample distributions agree with the asymptotic 
calibration 1661 

A. 2 We show the survival function (1 — F(H)) of the asymptotic distribution for 
H and for a sample distribution (N = 10^") drawn from the null distribution 
by simulation. To reduce the scale, we divide the survival function by the 
"flrst-order" approximation 1—F{H) ~ exp(— 0.4jJ)[41j. This approximation 
gives excellent results (< 5% deviation from the true distribution) to H = 
50 and above. We have marked the values H corresponding to (two-sided) 



signiflcance levels of 3a, 4cr, and 5a for reference Il67 
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A. 3 The ratio of the true asymptotic distribution for H20 to the approximate ex- 
pression exp(—0. 4/^20) • At H20 ~ 40, the survival function of the asymptotic 
distribution begins to decrease more rapidly than exponential approximation, 
indicating the importance of a quadratic term. We show such a term, de- 
termined by least squares, with the red curve, which provides an accurate 
estimation of the survival function over a broad range 168l 



B.l The resulting "signal" after processing an input with -F$ as outlined in 
the text. We simulate 10^ realizations from a von Mises distribution with a 
sharp peak and then generate histograms of for each of three distri- 

butions. The uniform distribution is trivial and represents the unprocessed 
signal. A sinusoidal distribution might arise, e.g., if the period of interest is 
very close to twice the orbital period of the S/C. An exponential distribu- 
tion is difficult to produce physically but provides insight into asymmetric 
distributions. In the upper left panel, we plot /$(</>) for each of the exposure 
scenarios. In the remaining three panels, we show the processed signal for 
three different absolute phases of the signal. Clockwise from upper right, the 



signal is centered 4> = 0.5, 0.8, and 0.2 Il71 
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5.1 The change in the log likelihood for 20 combined years of Monte Carlo data for 
three different spectral shapes when comparing band PSFs based on Egeo, the 
geometric mean energy for the band, and Eopt, the optimal energy estimated 

by Eq. 11231 [tI 

5.2 The accuracy of maximum- likelihood estimated parameters for three point 
sources with a flux of 2 x 10~''ph cm~^ s~^ and three different power law 
photon indices. The agreement is much better than 1% for all three sources, 
and there is additionally good agreement between the estimates for front- 
converting events and back-converting events. The general trend is for the 
source-as-fit to have a harder spectrum than simulated; this is in keeping with 
the slight excess of modeled events at low energy, as a hardened spectrum 
ameliorates this excess M 

5.3 As Figure [521 but with the Galactic and isotropic diffuse backgrounds added. 

The agreement remains at the 1% level [s^ 

5.4 As Table (521 but now including the effects of energy dispersion in the data. 

No diffuse background is included [s^ 



5.5 As Table [531 but now including a diffuse background l81 



6.1 A comparison of the DC position obtained for PSR J2021-I-3651 with three 
methods to the (well-constrained) timing position [8j. The "BB" entries cor- 
responding to using a broadband spectral model (respectively a simple power 
law and a power law with exponential cutoff), while "BF" refers to the model- 
independent method discussed in Chapter [H The three methods are self- 
consistent, but inconsistent with the radio position at high confidence [o^ 

6.2 PSR J2021-I-3651, comparison of the DC position to the (well-constrained) 
timing position. This fit includes a new, highly significant source, and the 
Fermi-L AT position now agrees with the timing position to a precision of < 1'. [94 

6.3 The fraction of the photons in a circular ROI with radius Rroi contained 
within a HEALPix (sharing the same center as the ROI) of size 47r/(12A''J 
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6.4 The properties of the 5 milhsecond pulsars detected to date in the GBT 350 
MHz survey. The binary J1302-32 awaits further timing to constrain the 
orbital parameters. PSRs J0023+09 and J1810+17 are likely of the "Black 
Widow" class[SD], while PSR J2215+51 presents eclipses [u^ 

6.5 The five millisecond pulsars discovered in a serendipitous Parkes survey. . . . [l20l 
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GLOSSARY 



LAT: Large Area Telescope, a pair-conversion telescope 

HE: high energy; a class of 7 rays with energies between about 100 MeV and 100 GeV; 

the passband of the Fermi-LAT 

GEV: 10^ electron Volts; in the text "GeV" is also used synonymously with "HE" for 
describing the Fermi-LAT passband, e.g., Fermi-LAT is a GeV telescope 

IRF: instrument response function; a statistical description of the measurement out- 
comes for an incident photon, including (a) the probability of an interaction and 
successful event reconstruction and (b) the distribution of measured variables about 
their true values 

PSF: point spread function; a probability density function describing the random spread 
in reconstructed position of photons incident from a point source 

TKR: the tracker, a section of Fermi-LAT comprising interleaved tungsten foils and 
silicon strip detectors 

CAL: the calorimeter, a section of Fermi-LAT comprising a hodoscopic array of Csl 
crystals 

ACD: anticoincidence detector; a component of Fermi-LAT comprising segments of scin- 
tillator designed to signal the passage of a charged particle into the detector 

FITS: Flexible Image Transport System, a standard for tabular data and images 
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CALDB: a scheme for storing calibration information for instruments and telescopes 

FTl: a high-level Fermi-h AT data product in FITS format; columns give reconstructed 
quantities (e.g., energy, time) for events 

FT2: a high-level Fermi-LAT data product in FITS format; contains information about 
the spacecraft's orientation and position as a function of time 

L: Galactic longitude, a measure of azimuth with origin at Earth referenced to the 
Galactic center 

B: Galactic latitude, a measure of polar angle with origin at Earth referenced to 
Galactic center 

S/C: spacecraft, in particular the vehicle of the Fermi-LAT 

GTI: Good Time Interval, a tuple of a start and end time; events with reconstructed 
times lying within the interval pass some set of criteria to be "good" events 

ML: maximum likelihood; a statistical technique for estimating parameters of a distri- 
bution 

CT: conversion type, a classification for a reconstructed Fermi-LAT event indicating 
whether the event converted in a thin tungsten foil in the "front" of the TKR (CT = 0) 
or in a thick tungsten foil in the "back" of the TKR (CT = 1) 

SNR: signal-to-noise ratio; N.B. not supernova remnant! 
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Chapter 1 
OVERVIEW 

In Chapter [21 we present an introduction to 7-ray astronomy, including the physics of 
7-ray interaction in matter and the resuhing constraints on 7-ray telescopes. We review 
the major components of the Fermi Large Area Telescope (LAT) and briefly describe the 
implementation of similar components in past experiments. We conclude with an overview 
of some of the major sources of 7 rays. 

In the next chapter, we detail the challenges of analysis of Fermi — LAT data. The 
primary consideration is the complex instrument response function (IRF), e.g. the power 
law dependence of angular resolution on energy. We suggest the use of likelihood techniques 
which incorporate the full IRF ab initio. We present the likelihood appropriate for photon- 
counting instruments and, by considering the IRF of the LAT in detail, develop a version 
suited to the LAT. We note that the expressions require multi-dimensional integrals and 
are generally computationally expensive. 

In Chapter IH we discuss the implementation of pointlike, a package for maximum like- 
lihood spectral analysis of Fermi-LAT data. By adopting a binning scheme that scales 
with the detector resolution, we achieve considerable compression of the data at low energy. 
Moreover, we make controlled approximations allowing for accurate but rapid evaluation of 
the likelihood. We detail these approximations formally and show they adhere to a goal of 
1% accuracy. By implementing a fast likelihood package, we open up new science through 
both interactive and large-scale analysis. 

Chapter [5] gives an overview of the validation of pointlike at the top level. We verify that 
the code accurately reproduces Monte Carlo data from the model, i.e. that the approxima- 
tions developed in Chapter H] are sufficiently accurate. We then validate the most important 
functionality of pointlike, viz. estimating spectral parameters and parameter uncertainties. 
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by performing fits of ensembles of simulated sources. 

To demonstrate the potential of pointlike, we present in Chapter [6] some actual pointlike 
analyses. We work through an "interactive" analysis of sources in the Cygnus region, 
conveying the importance of an exploratory approach in identifying new and necessary 
components of the source model. We develop the machinery for an all-sky analysis in which 
the spectra for all sources in the sky are determined consistently. The product of this 
analysis — an excellent model of the GeV sky — enables other large-scale analyses, and we 
detail three: the construction of "test statistic" maps for the detection of new sources, the 
generation of useful visual representations of the data via kernel density estimation, and 
the selection of unidentified LAT with pulsar-like properties. These pulsar- like sources have 
been targeted in radio pulsation searches, and we briefly present the results of two surveys. 

We switch gears somewhat in Chapter [71 We review statistics used for pulsation searches 
{Z^ and Hm) and argue that their use of only time-domain information only is inadequate. 
We modify the statistics to include a weight for each photon, and we use pointlike to 
calculate a weight giving the probability a photon originates from the source being tested. 
By leveraging the additional information in the photon energy and position via the spectral 
analysis, the statistic becomes appreciably more resilient to Type I and Type II error. We 
demonstrate the capabilities with an ensemble of Monte Carlo pulsars and show the weighted 
statistics improve the sensitivity by 50-100%. 

Finally, in the appendices, we present a derivation of the asymptotic distribution of the 
H test [42] . This new result obviates the need for Monte Carlo calibration. We also provide 
an extension of the methods presented in Chapter [7| to sources with periods long compared 
to the time scale on which the LAT orientation changes, allowing sensitive searches for 
orbitally-modulated emission from binary systems. 
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Chapter 2 

GAMMA-RAY ASTRONOMY AND THE FERMI LARGE AREA 

TELESCOPE 



Gamma-ray astronomy is the study of hght in the hmit of very few, very energetic 
photons. The gamma-ray band extends from soft 7 rays with energies of 100 keV — see 
e.g. the instruments aboard the INTEGRAL [85j observatory — up to 100s of TeV, the do- 
main of ground-based imaging air Cerenkov telescopes, e.g., VERITAS [47j . HESS [58], and 
MAGIC |46j. In between these two extremes, from 100 MeV to 100 GeV, hes the high-energy 
(HE) 7-ray band, which we shall also occasionally refer to as "GeV 7 rays" . To study this 
light is the purpose of the Fermi Large Area Telescope and this work. 

To order of magnitude, a blackbody spectrum peaking at 1 MeV indicates a source 
temperature of 10^'^K. Such temperatures occur in only the most extreme and ephemeral 
processes — e.g., core collapse supernovae|86j. In general, persistent HE 7 rays are emit- 
ted through non-thermal processes — inverse Compton scattering, BremsstrahlungU, and 
decay |65j. In ^2.31 we consider some of the sources hosting such processes and shining in 
the GeV. 

Before considering emission, however, we begin with the most important facet of 7- 
ray (or any!) astronomy: reception. Earth's atmosphere is opaque to 7 rays, so GeV 
telescopes are necessarily balloon-borne or space-based. HE source fluxes are generally 
low, and long integration times and reduced atmospheric background favor space telescopes 
(currently, AGILE [8OJ and the Fermi Large Area Telescope). In the following section, we 
give an overview of the relevant detector physics and resulting principles of operation of 
GeV telescopes. 



^We include radiation induced by magnetic fields — synchrotron and curvature — in this category. 
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2.1 Gamma-ray Telescope Principles 

Just as the phenomenology of low-frequency radiation — diffraction, refraction, e.g. — determines 
the design of radio and optical telescopes, so the physics of high-energy particle interactions 
shapes the design of 7-ray telescopes. Accordingly, we begin with an overview of the inter- 
actions of photons, particularly at high energy, in matter. 

2.1.1 7 rays in Matter: Pair Production and Bremsstrahlung 

At long wavelengths, light has a well-defined phase and interacts with matter as prescribed 
by classical electrodynamics |59j . At optical wavelengths, the energy associated with photons 
becomes comparable to atomic binding energies and the photoelectric effect is the dominant 
process. For heavier elements, the binding energy of core electrons reaches X-ray energies, 
allowing for continuing resonances in the photoelectric cross section, while for lighter ele- 
ments Compton scattering beings to dominate the cross section at energies of a few keV. At 
1 MeV (precisely, twice the electron mass), it becomes possible for a photon to produce an 
electron and a positron in the Coulomb field of a nucleus, the field absorbing the necessary 
four-momentum to ensure four-momentum conservation. Stronger Coulomb fields provide 
for a larger cross section with a more rapid onset at threshold. At sufficiently high energies 
(about 100 MeV for heavy elements), the pair-production cross section (a) dominates the 
total cross section (see Figure [2.1.ip and (b) is essentially flat (see Figure [2. l.ip . 

The flat cross section implies that the interaction rate is scale free, and there is effectively 
a mean free path for high energy photonf . 



is commonly called the radiation lengtlj^ 



J. This quantity depends only on the material, and 
% denoted Xq. In this same high-energy regime, the 
dominant interaction for electrons is Bremsstrahlung via interaction with nuclear Coulomb 
fields. (At the low-energy limit of this range, energy loss due to ionization of the material 
becomes important; this critical energy[87\ is material-dependent but is roughly 10 MeV for 
heavy elements.) At first order, the energy loss rate for Bremsstrahlung is dE/dx = —E/Xq, 
i.e. electrons emit on average all but 1/e of their initial energy after traversing a distance 



^Equivalently, the energy loss rate is linear in the incident energy, 
radiation length is 7/9 of the photon mean free path[87]. 
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Figure 2.1: The probability that the first interaction of an incident photon in the given 
material is pair production. By 100 MeV, the cross section for photon interaction in heavy 
elements is dominated by pair production. Reproduced from [87j . 
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Figure 2.2: The photon cross section in carbon (Z=6) and lead (Z=82). The greater binding 
energy of atomic electrons in lead allows photoelectric resonances to X-ray energies, while 
Compton scattering is the dominant process in X-rays for carbon. In both materials, pair 
production on nuclei becomes dominant at high energies, where the cross section is essen- 
tially flat. The absolute value of the cross section is significantly higher for lead, as a oc Z^. 
Reproduced from [57] . 
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Xo. 



These conditions lead naturally to an electromagnetic shower in which an incident pho- 
ton produces an e~ je^ pair, each member of which in turn radiates high energy photons, 
leading to an exponentially-growing cascade of electrons and photons. As long as the next- 
generation particles remain well above the pair-production threshold (photons) or critical 
energy (electrons), the shower continues to develop exponentially. Since the produced eT je^ 
are highly relativistic, the pair-production cross section in the lab frame is sharply peaked 
along the initial photon trajectory. The Bremsstrahlung is in turn beamed in a narrow cone 
about the electron trajectory. The resulting well-contained showers allow inference about 
the properties of the incident photon. 

An electron traveling in matter is subject to Coulomb scattering from surrounding atoms. 
Unlike the significant momentum change from Bremsstrahlung spurred by close approach to 
a nuclear field, the interactions with distant charges are mere "love taps" perturbing the path 
of the electron. For a sufficiently thick piece of material but small overall perturbations, this 
multiple Coulomb scattering is like a random walk, and the particle exits the material with 
an approximately Gaussian distribution about its initial direction. The standard deviation 
of this distribution is given in the relativistic limit bv[87] 



with X the length of material traversed measured in radiation lengths. 

These basic physics considerations mean that a GeV telescope must be a full-fledged 
particle detector capable of tracking charged particles and measuring the energy of showers. 
In the next section, we consider the components a typical GeV telescope needs to accomplish 
these tasks. 

2.1.2 Components of a GeV Telescope 

The nature of interaction of GeV photons with matter sets strong constraints on telescope 
design. The only available detection process is conversion of the incident photon into a pair, 
i.e., there is no focusing of 7 rays. We have the following results: 

• converting the bulk of incident photons requires sufficient radiation lengths of material; 




(2.1) 
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• maximizing the pair-production cross section (relative to, e.g., Compton scattering) 
and the electron critical energy requires a high-Z converter; 

• obtaining a good estimate of the photon trajectory requires measurement of the posi- 
tion of the first generation pair before it has suffered significant multiple scattering; 

• obtaining a precise measurement of the energy requires converting, containing, and 
measuring the bulk of the electromagnetic shower, i.e., many radiation lengths for 
conversion and additional material to measuring ionizing charged particles and radi- 
ation. 

These constraints essentially design 7-ray telescopes for us, and nearly all successful GeV 
telescopes have used some form of the following components: 

• A combination tracker /converter module. To minimize multiple scattering and pre- 
mature Bremsstrahlung, thin, high-Z metal converter foils are interwoven with active 
material capable of tracking charged particles. To increase the fraction of converted 
photons, multiple conversion foils and tracking layers may be stacked. 

• A calorimeter. After having obtained position information, it is desirable to contain 
and measure as much of the shower as possible. The calorimeter should bring high 
energy particles out of the radiation range (i.e., be many radiation lengths deep) and 
then measure the energy of the ionizing daughter particles. 

Additional components require consideration of the telescope environment. The charged 
particle number flux — from primary cosmic rays and secondary cosmic rays trapped in 
the earth's magnetosphere — is about from 10^ to 10^ that of the HE 7-ray flux, and this 
considerable background must be rejected. Second, the earth is a bright source of 7 rays 
(see below and Chatper[3|), and it is desirable to reject 7 rays entering the instrument from 
below. Thus, we require two more components: 

• An anticoincidence detector that signals the passage of a charged particle. 
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• A mechanism to reject upward-going (zenith bound) 7 rays. 

Having now identified the guiding physical principles and the major components of GeV 
telescopes, we consider the implementation of the most sensitive GeV telescope to date, the 
Fermi Large Area Telescope. 

2.2 The Fermi Large Area Telescope 

As we observed in the last section, a GeV telescope is naturally modular, and accordingly 
we organize this section around the components of the Fermi Large Area Telescope (LAT). 
To provide context and to acknowledge the significant scientific achievement of previous 
GeV experiments, we intersperse discussion of the LAT components with remarks on the 
implementation adopted in foregoing telescopes and the discoveries made possible by the 
technical advances. 

2.2.1 The Tracker 

The tracker/converter (TKR[2T]) is the heart of the LAT. Enabled by advances in solid 
state particle tracking technology, it provides the sharpest view to date of the GeV sky. To 
balance the tension between efficiency (governed by the total radiation lengths of converter 
material) and angular resolution (governed by multiple scattering in the converter material) , 
the TKR is composed of 16 alternating layers of tungsten (Z=74) and tracking component^- 
The first 12 foils— referred to as the "front" of the TKR— are 0.028Xo, while the final 4 
foils — the "back" — are O.ISXq. Eq. 12.11 thus indicates that photons converting in the back 
layers suffer approximately twice the angular deviation of those converting in the front. The 
total number of converted photons are roughly equally apportioned between the front and 
the back by the ratio of Xq in the two TKR sections. 

Following each tungsten layer is a silicon strip detector (SSD) with a pitch of 230 mi- 
crons. The SSD comprises two planes with orthogonal strip orientations, allowing for a 



^An additional two layers of tracking planes without conversion foils lie at the bottom of the TKR, for a 
total of 18 tracking layers. 
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measurement of the transverse (to the detector plane) position of a charged particle. The 
fine pitch and numerous layers require nearly one million readout channels |22j. 

The TKR is not monolithic. It is segmented into a 4x4 array of 16 identical modules, 
each in turn made up of the 16 (18) conversion (tracking) layers described above. 

Although using modern technology, the LAT TKR is also evolutionary. The first 7- 
ray satellite telescope, OSO-3, had no tracking capabilities and instead relied on a rough 
collimation based on the anticoincidence veto, giving it a ~ 20° angular resolution. Even this 
rough discrimination was sufficient to detect diffuse emission from the Milky Way|63j (see 
below), but it was not until spark chambers were incorporated in SAS-2[49j and COS-B[27] 
that sufficient resolution and sensitivity to detect point sources was acquired. The spark 
chamber reached its zenith in EGRET |621 183j. where the 28 layers of tantalum- interleaved 
spark chambers (with 0.8mm wire spacing) achieved a high-energy resolution of 0.4°. 

2.2.2 The Calorimeter 

To measure the energy from an incident photon, the LAT is equipped with a hodoscopic 
calorimeter (CAL[60j). As with the tracker, it is segmented into 16 autonomous modules. 
(Together, the tracking and calorimetry elements and their electronics form a "tower".) 
Each of these modules is formed from 96 2.0 x 2.7 x 33.6 cm cesium iodide (Csl) cyrstal 
scintillator bars arranged in an 8 (depth, along instrument axis) by 12 (breadth) array 
with the long dimension in the same plane as the TKR layers. The orientation of the long 
dimension shifts by 90° in alternating layers. 

The crystals serve both to continue to seed pair production and Brehmsstrahlung and, 
once the electromagnetic cascade has reached its maximum (particles are at the critical en- 
ergy/ pair production threshold), the crystals scintillate as the main energy loss mechanism 
becomes ionization the Csl atoms. Two photodiodes attached to each end of the crystal's 
long dimension allow for readout and a measurement of the transverse position of energy 
deposition by light asymmetry. 

The total CAL is 8.4Xo on-axijf], which depth becomes insufficient to fully contain show- 



^The CAL contains about 1800kg of Csl, constrained by the payload limit of the Fermi launch vehicle. 
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ers initiated at high energies. The segmentation allows characterization of the longitudinal 
profile of the shower and hence an estimate for the fraction of energy that escapes the CAL. 
The segmentation also allows an accurate measurement of the centroid of energy deposition 
which serves as an anchor point in track reconstruction algorithm [22]. 

While the depth of the CAL is similar to the calorimeter flown on EGRET, the greater 
transverse area of the CAL enables the wide field-of-view of the LAT. Beyond facilitating 
track reconstruction and energy measurement, the imaging capabilities of the CAL also 
obviate the need for a dedicated time-of- flight coincidence system for rejecting upward- 
going 7 rays; EGRET, SAS-II, COS-B, and OSO-III all made use of such a system. 

2.2.3 The Anticoincidence Detector 

The charged particle background flux can exceed that of the desired 7-ray signal by up 
to 10^[68j. While the TKR and CAL can reject hadronic showers with relatively high 
efficiency, cosmic ray electrons generate cascades very similar to those initiated by photons. 
These too, of course, can be vetoed at the cost of efficiency by requiring "empty" TKR 
planes above the conversion point of the photon. However, in the face of complex analysis 
and extremely high backgrounds, it makes much more sense to attempt to detect charged 
particles independently. This function is provided by the Anticoincidence Detector (ACD), 
a plastic scintillator-based detector covering the top and sides of the LAT. 

The ACD comprises 89 scintillator tiles, each with independent and redundant readout. 
Spaces between tiles are filled with ribbon scintillators. By segmenting the ACD, self- 
veto from "backsplash"y is significantly decreased [68 j. The segmented ACD of the LAT is 
a significant improvement on those of previous experiments, which were largely monolithic 
and thus suffered from appreciable self-veto. The sensitivity of EGRET, e.g., was essentially 
zero above 50 GeV[68j. 

® "Backsplash" denotes the phenomenon of soft X-rays produced in the electromagnetic cascade of a high- 
energy (above « lOGeV) photon travehng back through the TKR and into the ACD. Some of these X-rays 
will Compton scatter, causing an ACD trigger. A segmented calorimeter allows for a fine-grained veto, as 
veto can be restricted to only events with a track extending to a triggered ACD tile. 
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2.2.J^ Orbital Environment and Operating Modes 

The LAT was launched on June 11, 2008 from Cape Canaveral mto an orbit with an altitude 
of ~ 565 km inclined ~ 26° from the equator. The orbit precesses with a period of about 
53 days. The S/C typically operates in a "sky survey" mode in which with each orbit it 
rocks away from the zenith by ±0°, alternately viewing the northern and southern orbital 
hemispheres. The initial rock angle was D = 35° but has since increased to D = 50° to 
reduce the operating temperature of the S/C battery. 

If the onboard software of the LAT or the GBM (Gamma-ray Burst Monitor) detects 
a signal consistent with a gamma-ray burst, the S/C can execute an Autonomous Repoint 
Request in which the sky survey mode is abandoned and the S/C maintains an attitude 
to keep the putative burst site near the center of the field-of-vie^Ml|. Since such pointing 
strategies allow the S/C axis to approach the limb of the earth, these periods are typically 
excised in standard analysis. 

As noted in the previous section, the charged particle background for the LAT is con- 
siderable. The typical trigger rate is ~ 2.2kHz, but can increase by about a factor of 2 
during some orbits as the earth's magnetosphere rotates in the plane of the S/C orbii|f. 
Reading out the LAT takes about 26 ^s, leading to deadtime of order 5%. On the majority 
of orbits, the S/C encounters the "South Atlantic Anomaly", or SAA. Within this region, 
the particle background is significantly higher than typical for the LAT's orbit, making 
nominal observations impossible. When the S/C enters the SAA, the LAT ceases to trigger, 
and the time spent traversing the region is lost to normal science operations. This episodic 
deadtime decreases the LAT duty cycle by about 10% and leads to a modest north-south 
asymmetry in the exposure. 

^As with many telescopes, viewing a source directly on axis is a poor stategy since the "cracks" in the 
detector are then aligned with incident photon trajectories. 

^Background leakage is then also time-dependent on < Id timescales; care is required when searching for 
periodic signal from short-period binaries! 
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2.3 Sources of Gamma Rays 

To complete the overview of 7-ray astronomy and Fermi, we take a brief look at the classes 
of 7-ray sources filling the high-energy heavens. 

2.3.1 Dijfuse Emission from Cosmic Rays 

Cosmic rays produce GeV radiation via three primary mechanisms [18]. At energies below 
and to the edge of the LAT passband, the emission is dominated by electronic processes, 
particularly Bremsstrahlung (induced by gas) and inverse Compton (IC) scattering of am- 
bient light, e.g. the cosmic microwave background. Coincident with the low end of the LAT 
passband, the contribution of hadronic processes turns on rapidly. The inelastic scattering 
of high energy protons in gas nuclei produces vr"*", ir^, and vr" particles in roughly equal 
numbers, and the dominant decay process for vr" (98.8% branching ratio|87j) is to two pho- 
tons. The 7 spectrum depends on the vr*^ spectrum, which in turn depends on the cosmic ray 
proton spectrum and the cross section for inelastic nucleon scattering. This cross section 
turns on rapidly at the threshold for vr*^ production [511 [61], leading to the characteristic 
rapid rise of the observed diffuse 7 spectrum in the LAT band[6]. 

At the high end of the LAT passband (> 100 GeV), electronic processes can again 
become important via IC scattering of the interstellar radiation field. 

In all of these processes, the 7-ray production depends on both the cosmic ray energy 
density and the "target" density — gas for Bremsstrahlung and radiation for IC scattering, 
leading to significant variations of the appearance of the HE sky as a function of energy, a 
topic we take up in a brief discussion of emission in our own Milky Way. 

The Milky Way 

As can be appreciated from the representation of the 7-ray sky in Figure 16.31 the dominant 
feature is diffuse emission associated with the structure of the Milky Way. The intensity 
of the low latitude emission is a projection effect as the emissivity scales as the product 
of the cosmic ray density and the target density. This sharp peak in intensity [14 j leads to 
difficulty in the analysis of point sources in the plane [T2]. Emission from higher latitude 
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represents cosmic ray interactions in our local environment [6]. 

Starhurst Galaxies 

If supernovae shocks provide the accelerators for cosmic rays, starhurst galaxies |64j. with 
their intense patches of high star-formation rates, should be modestly bright 7-ray sources, 
as the high supernova rate and enhanced dust concentration (target material) lead to sig- 
nificantly elevated 7-ray production relative to, e.g., the Milky Way. The detection by 
Fermi-LKT of two starhurst Galaxies [64J confirms this idea, providing both a new oppor- 
tunity for the study of cosmic ray production and insight into the observed extragalactic 
7-ray background. 

The Earth 

As with much other emission from the Solar System (e.g., the moon and the quiescent 
sun [30], emission of 7 rays from Earth has its origin in cosmic rays[4j. Due to its proximity, 
the limb of the earth is in fact the brightest source of 7 rays seen by Fermi, requiring care 
in separating its signal from that of celestial sources, see e.g. Figure [3. 1.21 and discussion in 
SX21 

2.3.2 Active Galactic Nuclei 

The center of most galaxies are known to harbor massive > IQ^Msun black holes. While 
most are quiescent (e.g. our own Sgr A*), accretion at ~ 10% of the Eddington rate onto 
these black holes can fuel powerful relativistic, collimated outflows known as jets. These 
jets are often observed to have apparent super luminal velocities (e.g. [29]) implying the 
jet material has Lorentz factors of order 10 and the motion is largely along the line of 
sight. Such motion leads to a Doppler boost in both luminosity and apparent energy, 
making emission visible from cosmological distances. GeV emission is produced through IC 
scattering of seed photons, which may be produced by synchtron emission by the electrons 
in the magnetic field of the jet or from sources external to the jet, e.g., the bright accretion 
disk[77J. AGN bright in the GeV are known as blazars and represent the class of AGN 



15 



with strong jets aligned very closely with the line of sight. The class breaks down into 
flat-spectrum radio quasars and BL Lac objects, and Fermi sees roughly equal numbers of 
each [16], An appreciable component of the observed isotropic diffuse emission|15j is likely 
contributed by unresolved blazars. 

2.3.3 Pulsars 

Pulsars are rapidly-rotating, highly-magnetized (10^ to 10^^ gauss) neutron stars. The 
intense currents induced lead to radiation emission over a broad energy range. Although 
pulsars have a storied history in radio[53, -Fermi-LAT has essentially opened up the study 
of pulsars in GeV, e.g., with the detection of millisecond pulsars [2], the detection of tens of 
potentially radio-quiet pulsars [HITS], and thus we concentrate on their high-energy emission. 

In canonical emission models, the acceleration of charged particles to very high energies 
and subsequent production of 7 rays through synchrotron/curvature radiation or inverse 
Compton scattering relies on the "gap" paradigm. A relative vacuum — a gap — develops in 
a magnetosphere otherwise filled with charged particles configured such that E • B ~ 0[52j. 
The unshielded potential in the gap is sufficent to accelerate particles to ultrarelativistic 
energies, powering pair cascades which produce the observed GeV emission and allow current 
to fiow. 

Emission models are primarily distinguished by the site of the gap (Figure I2.3.3P . Polar 
cap models[38l |39] place the gap immediately above the polar cap, the small patch on the 
neutron star surface defined by the footprint of the open magnetic field lines. (A field 
line is open if it crosses the light cylinder (LC), the surface at which a corotating particle 
would attain light speed.) Outer gap models pH [36l EU [73] invoke a gap at higher altitude, 
along the last closed field lines. The gap extends from the null charge surface — where the 
corotating charge density oc • B = changes sign|52) — to the LC. Slot gap models [56] 
postulate narrow gaps encircling the PC and rising along the last closed field lines to the 
outer magnetosphere. The two-pole caustic model[45] is essentially a slot gap model with 
emission from a vanishingly thin surface along the last closed field lines. The annular gap 
model of [23j illuminates field lines along a thin annulus interior to the last closed lines. 
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Figure 2.3: The canonical representation of a pulsar magnetosphere, reproduced from [45j . 
The magnetic moment of the neutron star (NS), /i, is inclined to the NS spin axis, fi, by 
an angle a. The field is assumed to be dominantly dipolar, though higher multipoles may 
be important [88] • The Goldreich-Julian equilibrium charge density[52j pcj vanishes along 
the null charge surface. The small patch on the NS surface at the foot of the open B field 
lines is the polar cap. The slot gap (dashed lines) and outer gap (hatching) lie along the 
last closed (open) field line. HE emission is beamed along B field lines, and the observer's 
inclination determines which field lines are visible, strongly affecting the resulting light 
curve. 
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Relativistic effects, such as aberration, become important in outer-magnetospheric emission 
and naturally furnish the characteristic cusped light curves of many HE pulsars [3^ [5], 

A second important consideration is the assumed structure of the magnetic field. Most 
calculations employ the solution of Deutsch|43j. which posits an infinitely-conductive star 
and a vacuum exterior. At the other extreme are force-free magnetospheres in which the 
star exterior is modeled as a highly-conductive fluid and numerical solutions are derived 
with magnetohydrodynamic codes[79i. The true solution must combine elements of both. 
The two approaches differ most in the outer magnetosphere, where the co-rotation velocity 
becomes relativistic and the modification of field lines by plasma becomes important. Since 
LAT results point to a primary emission site in the outer magnetosphere [3 [2], this distinc- 
tion becomes even more important. In both models, the field geometry depends strongly 
on a, the inclination of the dipole axis to the spin axis (Figure [2. 3. 3p . 

2.4 Summary 

The myriad sources of 7 rays — which we have only touched on here — offer opportunities to 
study some of the most extreme processes in the universe. However, the physics of 7-ray 
interactions require non-traditional telescopes, viz. particle detectors. By taking advantage 
of advances in solid state technology, the Fermi-LAT offers unprecedented angular resolution 
and sensitivity, particularly at energies > 10 GeV via suppression of self-veto. However, 
as we shall see in the following sections, the relevant physics still set fundamental limits 
on the capabilities of the detector. In particular, multiple Coulomb scattering leads to an 
angular resolution that varies by more than two orders of magnitude over the LAT passband. 
Dealing with this multi-scale response requires care, and in the next chapter we introduce 
the principle of maximum likelihood, leading to techniques optimally suited for analysis of 
Fermi-LAT data. 
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Chapter 3 

LIKELIHOOD ANALYSIS FOR HIGH-ENERGY GAMMA-RAY 

TELESCOPES 

Analysis of data from Fermi-h AT and other HE gamma-ray telescopes (e.g., EGRET [66]) 
requires techniques not normally found in analyses of data from other wavebands. In optical 
astronomy, for instance, observations are typically made of a narrow range of wavelengths, 
and the angular resolution of the instrument is sufficient to resolve the astrophysical back- 
ground. To determine the flux from a star, e.g., one identifies the pixels in the CCD illumi- 
nated by the image of the star (aperture photometry), measures the collected charge, and 
divides by the telescope aperture, time of observation, and efficiency of the filter and CCD. 
In practice, of course, these steps may be quite involved, but the point remains that physi- 
cal quantities, e.g., source intensities, can generally be extracted directly from the data. In 
addition, time-dependent instrumental effects are typically small, as sources can be tracked 
with great precision, resulting in a stable instrument response over the integration. 

Such methods fail for Fermi-LAT data for two reasons. First, very few Fermi-hAT 
sources appear isolated. Looking at Figure El we note the absolute scale of the PSF at 
energies of order 0.1 GeV is about 5°, and an immediate consequence is source confusion. 
Suppose we want to extract the flux between 0.1 and 0.3 GeV for a particular source. 
Except for the brightest blazars at high Galactic latitudes, there is generally no circle we 
can draw that is dominated by emission from the source of interest. The situation is dire 
in the Galactic plane, as the projected density of point sources increases dramatically along 
with the diffuse background from cosmic rays. At higher energies, the PSF scale drops to 
< 1° and the situation improves. By carefully calculating the integral of the PSF over an 
extraction radius, one could reliably extract fluxes for sources at high Galactic latitude by 
photon counting. Sources in the Galactic plane, however, still suffer strong contamination. 
In general, then, any technique that relies on simply apportioning photons to a source will 
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deliver poor results. 

Second, detecting and characterizing GeV sources may require integrations of a year or 
more. The orientation of the instrument with respect to the source is constantly changing 
as the S/C orbits the earth, so there is no single "number" characterizing the instrument 
characteristic during the observation. In essence, we are using a different telescope every 
few minutes. Even if we could estimate accurately the number of photons we have observed 
from a particular source, we cannot straightforwardly convert it to a flux. 

These considerations and others discussed below motivate likelihood techniques for the 
analysis of Fermi-L AT data. Such techniques work in the "forward-folded" space, i.e., not 
with physical quantities but with instrumental quantities, and they naturally incorporate 
the complexity of the energy-dependent PSF and time-dependent observation. Below, we 
introduce likelihood techniques in general and then specialize to GeV instruments and the 



3.1 Characterizing and Modeling Sources 

We characterize a source by its photon flux density. (In the following, we neglect polar- 
ization, since it is very difficult for a pair-conversion telescope to measure the polarization 
of an incident beam.) The particle flux density — the rate of photons incident per unit en- 



i.e., —cp/E, with p the photon momentum vector. Although the distinction is seldom of 
interest, t is understood to be the retarded time. 

Unlike infrared, optical, or even X-ray sources, the mechanisms that power HE gamma- 
ray sources almost universally produce broadband spectra, that is, spectra spanning multiple 
decades of energy and possessing no features and modest curvature. Such spectra can be 
modeled with only a few parameters, and often a simple power law suffices: 



LAT. 



ergy/area/time from a solid angle dO, about the position Q — completely encapsulates the 
properties of a source and is denoted T{E,t,Q,). Its arguments are the observables asso- 
ciated with a photon, namely energy (E), time of arrival {t), and direction of origin {^), 




(3.1) 
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Energy (MeV) 

Figure 3.1: The Fermi-hAT point-spread function as characterized by "r68" and "r95", the 
angular separation from a point source within which 68% and 95% of photons from the 
source He. The description here is for photons arriving close to the instrument's boresight; 
the angular resolution drops by up to 50% at the edge of the field of view (see Figure I4.1.2p . 
As discussed in Chapter [2l the energy-dependence originates from multiple scattering of 
pair-produced electrons and positrons in the tungsten foils, and the asymptote is set by the 
pitch of the silicon strips in the tracking layers. 
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A simple extension, the exponential suppression of the flux above a cutoff energy, 

^ _r(t,f1) / ^ \ ^ 

F{E, t, n; N, r, E„ Eo) = N{t, n) ( — ) exp f{n), (3.2) 

\EoJ y Ec{t,9.)) 

is adequate to characterize the emission from most 7-ray pulsars[5j. Here, f{^) is a nor- 
malized function (J dVt f{^) = 1) describing the spatial morphology of the source. We are 
mostly interested in point sources, i.e., those that cannot be spatially resolved and for which 
the spatial dependence is given by the Dirac delta function, /(O) = 5(0 — VLq), where VLq 
is the true position of the point source. In general, we will not specify a particular form of 
spectral model and instead label it with a set of parameters, A, and express the modeled 
flux density for a source as F{E^ t, fl; A). 

3.2 Principles of Maximum Likelihood 

With a model for the physical properties of sources in hand, the next step is to connect 
them to data and estimate the model parameters. Below we outline the method of maximum 
likelihood (ML) parameter estimation which shall be our primary tool for this task. 

Suppose an experiment is designed to measure a random vector X , that the random 
variables admit a probability density function (pdf), and that the pdf is parameterized by 
some vector A. We express the pdf as f^{x; Xjj, meaning that the probability to measure 
a value near x is just fj^{x]\)dx. Viewed this way — being given A — the probability is a 
function of if, the possible outcomes of our experiment. On the other hand, having performed 
the experiment and measured x, we can view the same expression, fj^{x; A), as function of 
A, i.e., it characterizes the parameter values most likely to have yielded the measured values. 
When viewed as a function of the parameters, this quantity is called the likelihood and we 
denote it C{X;x), although we often suppress the explicit dependence on x. 

The likelihood is useful for the common statistical/analysis task of determining esti- 
mators for the parameters of a distribution from data: natural estimators are those which 
maximize the likelihood, known as the maximum likelihood estimators, or MLE. In many 
cases, the MLE are consistent, have an asymptotically normal distribution, and are efficient. 



^The notation for pdfs is discussed more fully in Chapter [T] 
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That is, with enough data, the estimators converge to their true values, the parameter un- 
certainty can be estimated from the shape of the hkeUhood, and the uncertainties are the 
smallest possible. 

For large data sets, the likelihood is likely to be a very small number. Since the logarithm 
of monotonic functions is again monotonic, we can instead maximize the "log likelihood" , 
which is also often easier to compute. 

We note in passing that, via Bayes' Theorem, the likelihood can be converted to the pos- 
terior probability by multiplication with the prior probability. That is, since f{x; A) /(A) = 
f{X;x)f{x), the posterior probability density, /(A;x), is proportional to C{\;x)f{\). Here, 
/(A) is the prior probability, or the distribution for the parameters we would expect in the 
absence or any data, or in light of previous experiments. This interpretation allows one to 
interpret the statistical errors on the estimators as uncertainties on the true values, e.g., 
to construct credible intervals which have a 90% chance of containing the true value of the 
parameter. 

3.3 The Observed Signal 

As outlined above, likelihood provides a connection between model parameters (as close 
to physical measurements as we can hope to get given the experimental constraints) and 
data. Implicit (and perhaps hidden) in the simple notation we employed is the mapping of 
the source models onto the actual data. That is, if we want to calculate the probability to 
observe some set of data (in the case of GeV instruments, some collection of photons) given 
some model, we must first figure out what that model looks like to the detector. The first 
step is understanding how a signal from a source is processed by the instrument. 

A GeV telescope is essentially a particle detector, and the fundamental data are events. 
The detector, of course, is imperfect: it (a) fails to generate observables for all incident 
photons, and (b) introduces error into the observables for successfully detected photons. 
To illustrate, suppose some number of photons from a steady, monoenergetic point source 
impinge upon the detector. For the LAT, the tungsten foils in the tracker provide about 



23 



Xq = 1.1 radiation lengths |22] , and ~ 40% of photons at normal incidence! will pass through 
the TKR without interacting [21]. These photons cannot be successfully reconstructec|f|. Of 
the photons that interact in the TKR, some will fail to produce an identifiable signal^. And 
of course, for those that interact and are successfully recognized as interacting photons, we 
will see a spread in the reconstructed energies and position. 

These effects, both the efficiency for successful reconstruction of an incident photon and 
the dispersion of its true observables are characterized by the instrument response function, 
or IRF. In keeping with its two roles, we explicitly factor the IRF into two pieces. The 
instantaneous sensitivity, with units of area and denoted e{E, t, Q), is the quantiy by which 
the flux density should be multiplied to determine the instantaneous count rate in bin of 
width dE X dtx dO.. The dispersion matrix, P{E' , t' , ^}'\E, t, gives the probability that a 
photon with true energy E, time of arrival t, and position will have a reconstructed energy 
E', time of arrival t', and position ^1' . The dispersion matrix is a true probability density 
function and is normalized such that Jjj dE d^ldt P{E' ,t' ,0,'\E,t,Q) = 1. The expected 
event rate in an infinitesimal bin centered on E', t', and 17' is then 

r{E',n',t';X) = Iff dEdndtT{E,n,t;X)e{E,n,t)P{E',n',t';E,n,t). (3.3) 



The integral is over the support of J^{E, J7, t; A). 

This quantity — the source flux as processed by the instrument — is what we need to 
calculate the likelihood. Before we proceed, we first describe in some detail the Fermi-hAT 
IRF and make some simplifications to Eq. 13.31 

3.4 The Instrument Response Function of the Fermi-LAT 

The calculation of the terms €{E, J7, t) and P{E' , fi', t'; E, fi, t) above requires some care for 
space-based gamma-ray telescopes such as the Fermi-LAT. First, we note that the response 

^The apparent depth of a tungsten foil is inversely proportional to the cosine of the incidence angle. 

''in principle, events can be reconstructed using the hodoscopic calorimeter only, but this technique is 
not yet part of standard Fermi-hAT analysis. 

*The cross section for pair production in tungsten is essentially flat in energy from a few hundred MeV 
to 100s of GeVlST, so the increase in efl^ective area (Figure [33| between 100 MeV and 1 GeV is primarily 
due to increasing performance of the event reconstruction algorithms. 
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of the LAT to an incident photon depends primarily on the geometry, viz. the angle between 
the photon's momentum vector and the instrument's boresight, which by convention defines 
the z-axis in the S/C frame, z. While this dependence is both on the polar angle, Vt ■ z{t) = 
cos 9{t) and the azimuth, for events not too far from the boresight, the azimuthal dependence 
is subsidiary to the polar dependence. Furthermore, over long integrationjf], the azimuthal 
dependence tends to average out (see, e.g., Figure [3^ . Thus, we express the geometry 
solely in terms of the polar angle, which depends on the position and orientation of the S/C 
as it orbits. 

We consider first the term e{E, 0, t), which is essentially the effective cross section of the 
detector at time t. (As noted above, not all events can be reconstructed, so the effective area 
is always less than the geometric cross section.) The term factors into the effective area, 
A[E, cos 6(t)], with units of area, and a dimensionless efficiency, which we denote e{E,t), 
a number between and 1. The effective area accounts for the ability of the instrument 
and software to successfully reconstruct a photon with energy E incident at an angle 9. It 
gives to first order the true collecting area of the detector. We show the effective area for a 
particular version of the reconstruction algorithms, P6-v3_diff, in Figure [331 The efficiency 
comprises two terms. First, an energy-independent term depends on the trigger rate, as 
the instrument has a period of deadtime following each trigger while the electronics are 
read out. Second, there is an energy-dependent effect resulting from "pileup". For about 
10% of triggers, an additional charged background particle is interacting in the detector, 
degrading the reconstruction of the triggering particle and decreasing the efficiency. The 
effect decreases with incident photon energy. Since the trigger rate is dominated by charged 
particles, both of these effects depend on geomagnetic latitude and hence on time. 

Next, we consider the term P(E' ,Q' ,t'; E,Q,t). We make the simplifying approxima- 
tion that the three observables are statistically independent and write P{E' , Q' ,t'; E,Q,t) = 
f psi[0,';^i, cos 9 {t),E] X /cdisp[-E'; cos 6* (*),£'] x f tdisp{t' ; E,n,t), with /psf the point-spread 
function, /edisp the energy dispersion function, and /tdisp the time dispersion. This assump- 
tion is certainly justified for t' . The position (energy) reconstruction is dominated by the 

^For transients such as gamma-ray bursts, only a short selection of data is used, and this approximation 
clearly fails. 
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Figure 3.2: The distribution in azimuth for a set of reconstructed events (P6 "diffuse" class) 
with 100 MeV < E < 100 GeV in a 10° aperture centered on (l,b) = (0,-45), integrated from 
Aug. 4 2008 to Jul. 18 2010. Events with a reconstructed zenith angle > 105° have been 
discarded. (See Figure H . 1 . 2 1 which charts the zenith angle for this same data.) The Fermi- 
LAT has twofold bilateral symmetry, so we can obtain the "unique" azimuth angles, i.e., 
those that truly probe the dependence on the shape of the detector, by (j)unique = 4> mod 90° 
and, further, reflecting events with (fiunique 

> 45° to 90° - 45°. The red horizontal hue 
indicates the mean, while the blue gives the reconstructed unique azimuth in 1° bins. The 
deviation from the mean is quite modest. 
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Figure 3.3: The effective area for Fermi-LAT as described by the P6-v3-diff IRF. The 
lefthand column shows the effective area of the front section of the detector (see ^2.2.ip and 
the righthand column the back. The top row shows the joint dependence of the effective 
area on energy and incidence angle. The second row shows the dependence on energy for 
a set of incidence angles, while the third row shows the dependence on incidence angle for 
a set of energies. We note that effective area is essentially zero below 100 MeV, turns on 
rapidly, and becomes essentially flat at 1 GeV. This behavior is governed by the difficulty 
in reconstructing "soft" photons. 
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signal in the TKR (CAL), so to first order E' and ^1' are also independent. We discuss each 
of these terms below. 

3.4-1 The Point-spread Function 

The point-spread function (PSF) lies at the heart of point source analysis. The spatial 
distribution of photons on the sky is the single most important handle for disentangling 
confused sources. Accordingly, we take some care in characterizing and (implementing) the 
PSF. 

As with the effective area and energy dispersion, the PSF is determined by end-to-end 
Monte Carlo simulations of the detector. The Monte Carlo events are binned in (true) 
energy and (true) incidence anglqj. The resulting distribution of reconstructed incidence 
angle is fit with a parameterized analytic function. These parameters describe the PSF over 
the instrument's phase space. 

The PSF by definition is the probability density to observe an event in a small piece of 
solid angle dfi centered on 17 given that event originated from f^o- That is, for some energy 
and incidence angle, suppressed here, the PSF is 

dN{n) _ dN{Q) _ 1 dN{cose) _ ldN{9^) 
dCl d(f) d cos 6 271 d cos 9 vr dO'^ ' 

where (j) is the azimuthal angle and 6 the polar angle measured with respect to the origin 
r^O) and where we have made the approximations of azimuthal symmetry and small angular 
deviations. We see that the natural variable is 0^, the squared angular deviation, as isotropic 
sources {dN/dVt constant) are flat in this quantity. This motivates the definition of u = 
0^/2(T^, in terms of which 

dN{^) ^ 1 dN{u) 
dn ^ 2-71 du ■ ^ ' 

While we could in principle use the Monte Carlo histograms directly as a representation 

of the PSF, a fitted analytic function is much more convenient; we thus need a function 

dN{u)/du that adequately characterizes the instrument performance. The functional form 

adopted for the PSF is the King function which arises in characterization of many-body 

®The spectrum of incident events is uniform in logarithmic energy. 
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gravitational dynamics |28j. Mathematically, it is a precursor to the Gaussian distribution. 
That is, if the King function is defined as 

fk{u,j)= + \ (3.6) 



7 



then since 

-7 



lim ( 1 + - ) = lim /fc(n,7) = exp(-n), (3.7) 
7— >oo \ 7 y 7—^00 

the King function can be seen as a "finite 7" version of the exponential. Since u = 6'^/2a'^, 

we see that the limiting form is a Gaussian in the angular deviation. Thus, the smallness 

of 7 characterizes how far we are from the "ideal" Gaussian PSF, and a sets the overall 

angular scale. (Note that 7 > 1 for the function to be well-behaved.) We see that, unlike 

the Gaussian, the King function has power law tails with slope —7 for n >> 7, and is thus 

more suitable for characterizing a PSF with broad tails. A King function fit to Monte Carlo 

data is shown in the top panel of Figure 13.4.11 

Thus, for a given energy and incidence angle, the PSF is described by two parameters, 

a and 7. In practice, these parameters are evaluted for bins in energy (4 bins per decade) 

and incidence angle (bins of width 0.1 in cos(0)) and stored in a FITS table as a part 

of CALDB, the repository for instrument calibration information. In order to retain the 

smooth energy dependence of the underlying behavior and to reduce the steepness of the 

parameter dependence on energy, the a parameter has a built-in energy dependence. That 

is, fJ EE (TCALDB X fa{E), with 

UEf ^ {piiE/WOMeV)-P'f+pl (3.8) 

with pi and dependent on the conversion type of the event and p2 = 0.8 in current 
versions of the IRF. p^, the high-energy asymptote, is governed by the pitch of the silicon 
strips in the tracker. (Tcaldb is the parameter value stored in CALDB and is typically of 
order unity; pi, p2, and p^ also appear in the CALDB. 

The PSF must be normalized. By integrating fk{u) from to 00, we see fkiu)' = 
(1 — l/^)fk{u) is normalized to 1. Thus, the final (King function) form of the PSF becomes 

' 1 A_iUi + J!_y' (3.9) 
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Figure 3.4: Monte Carlo data from a detector simulation for a bin centered on 1334 MeV 
for front-converting events. The events have been summed over incidence angle up to 66.4°. 
The behavior of the density function over many orders of magnitude is well-described by 
a single King function, i.e., that data show a sharp core and power law tails. However, 
there is a slight break in the slope of the power law tail, and accordingly the King function 
cannot provide a perfect fit, leaving residuals > 10%. In later versions of the IRF, this issue 
is ameliorated by the use of two King functions. 
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In addition to determining the asymptotic behavior at large u, 7 also determines the 
transition point at u = 7. This dual role for 7 sets a stringent relation between the tail 
steepness and the width of the core which is not always satisfied by the data (see Figure 
I3.4.ip . E.g., improvements of the reconstruction algorithm delivered a PSF with an en- 
hanced core and a "broken" power law tail. A single King function is unable to model this 
more complex shape. A new analytic function, the weighted sum of two King functions, on 
the other hand, adequately models the PSF. Thus, each energy/cos(^) bin actually has 5 
independent parameters, ac, crt, 7c, Jt, and w, i.e., the parameters for a core and a tail King 
function and the w, the relative weighting of the two. 

3.4-2 Energy Dispersion 

As with the reconstructed position, the reconstructed energy is subject to uncertainty. At 
the highest energies, above about 100 GeV, an appreciable fraction of the energy in the 
electromagnetic shower escapes the calorimeter, making an energy estimate difficult. At 
lower incident energies, a significant fraction of the photon energy may be deposited in the 
TKR. In between these two extremes, there is uncertainty due to shower particles missing 
the active elements of the calorimeter and the natural scatter induced when estimating the 
deposited energy from the crystal light yield. 

The magnitude of these effects are shown in Figure 13. 4. 2^ where it can be seen that the 
half-width at half-maximum (HWHM) at most energies is less than 10% but with appreciable 
tails and rather wider HWHMs at low energy, about 15 — 20%. This level of uncertainty 
is smaller than — but not much smaller than — the width of the energy bins introduced in 
Chapter IH which have a full width of about 30% at 8 bins per decade. 

In the following sections, we ignore the effects of energy dispersion by assuming it is 
negligible. While we see here that this is not the case, we make this simplifying assumption 
because it drastically reduces the computational burden of spectral analysis. (Including 
the effects of energy dispersion essentially requires an extra quadrature for all computed 
quantities.) We expect this approximation may not be too bad since, unlike the PSF, 
the energy dispersion does not itself depend strongly on energy. If our effective area were 
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independent of energy, then for power law sources, the only effect of energy dispersion 
would be a slight softening of the measured slope and a slight increase in the measured flux. 
Since the effective area rapidly rises for the first decade of the LAT passband, the energy 
dispersion does introduce some curvature in the measured spectra and causes hardening of 
measured spectra and decrease of measured fiux, as we shall see in ^5.1.51 However, the 
resulting bias on the measured parameters is found to be < 10%. 

3.4-3 Time Dispersion 

The LAT uses GPS for a time reference and measures the absolute time of photon arrival 
with an accuracy better than 1/xs |78l[22]. The error is negligible for nearly all applications, 
so we approximate /tdisp = S{t' — t). 

3.4-4 Approximate Observed Event Rate 

Given these approximations, the modeled observed event rate for a source becomes 

r{E',n',t';X) = J dnj^{E' ,t' ,n-,X) A[E' ,cos9{t')]e{E' ,t') fp,{[n'-,n,cose{t'),E']. (3.10) 

For a point source, we obtain 

r{E', Q', t'; A, Qq) = F{E\ t'; X)A[E', cos 9{t)] e{E' , t') fpsd^'] f^o, cos 6{t), E']. (3.11) 

The simplification delivered for point sources by the accurate timing of the LAT and the 
neglect of energy dispersion is that we may express the observed rate without any integra- 
tion. Diffuse sources, on the other hand, require a two-dimensional convolution to calculate 
the observed rate. 

3.5 Poisson Likelihood for the Fermi-LAT 

We now outline the formulation of the likelihood for the Fermi-LAT . As with all particle 
detectors, Fermi-LAT is a counting experiment. Events are binned by the observed quan- 
tities, particularly energy and position. They may also be binned in time, or accumulated 
into a single bin spanning the length of the observation. Thus, in the notation of H3.2\ X is 
the set of counts observed in each bin, which we now relabel N . 
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Figure 3.5: The energy dispersion (observed/simulated - 1) for the P6-v3-diff IRF for events 
incident on-axis. The top (bottom) panel shows events converting in the front (back) of 
the TKR. At low (order 100 MeV) energies, the resolution is significantly better for back- 
converting events since the shower commences closer to the calorimeter, i.e. less energy is 
deposited in or escapes from the TKR. 
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Each element of is distributed according to a Poisson distribution with unknown mean, 
Tj. We model this mean by integrating either Eq. 13.101 (diffuse sources) or Eq. 13.111 (point 
sources) over the bin. Since the components of A^, Ni, are mutually statistically independent, 
the probabily mass function for the data is the product of Poisson distributions with rates 
Tj. We recall that for a Poisson distribution with mean r, the probability to observe A'^ 
counts is 



p{N;r) = — exp(-r). 



(3.12) 



As noted in the introduction to this chapter, with the broad PSF of the LAT and the 
strong diffuse background present in the GeV sky, the rate for a bin of phase space involves 
contributions from multiple sources. Thus, using the probability mass function in Eq. I3.12( 
we write the (logarithm of the) binned likelihood for all selected data by summing over all 
A'bins bins and all Ng sources: 



log£(A;iV)= ^ 

1=1 

i=l 




Ns 



Ns 



^rj{E',n',t';X,) + NilogY, 

bin- J = l 

Ns Ns 

Rij + Ni log Rij 




rj{E' ,t';\j 



bin 



(3.13) 



(3.14) 



Here, Ni are the observed counts in the ith bin, and the triple integrals are over the energy, 
position, and time range specified for each bin. In the second line, we have defined Rij, the 
expected number of counts in the ith bin from the jth source. We have discarded the A^! 
term from Eq. 13. 121 since it is independent of the model parameters. This is the form of the 
likelihood we shall find useful in the following chapters. 

Typically, the bins are continguous, and we define the region of interest (ROT) to be the 
portion of the total data (the volume of phase space) we have selected. The likelihood then 
becomes 

r r [■ N^ins Ns 

logCiX; N) = - 1 1 1 Y^jiE' ,t';Xj) + Yl N.logY Rir (3-15) 

ROI ^ J=l 

This formulation facilitates the introduction of unbinned Poisson likelihood, in which the bin 
widths are taken to be infinitesimal, such that only or 1 counts can possibly be observed. 
This formulation sacrifices no information to binning but can become prohibitive for large 
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data sets. The unbinned likelihood, with now containing every count, is given by, 

r r r -'Vevcnts Ns 

logC{X,N) = - JJJ ^r,iE',n',t';X,)+ ^ log^rj{E'„n'„t'-,X,), (3.16) 

ROI ■'=1 

where denotes the reconstructed energy of the ith event, etc. 
3.6 Summary 

We have motivated the use of likelihood principles in GeV astronomy by demonstrating 
that — at least for current telescopes — it is impossible to make reliable measurements of 
astrophysical quantities directly from the data. Instead, we noted that GeV sources can be 
easily modeled with only a few parameters and suggested the method of maximum likelihood 
to estimate these parameters. To calculate the likelihood, we fold the parameterized models 
through the instrument response function, thus naturally accounting for the complex IRF, 
source confusion, and the constantly-changing orientation of the instrument. We outlined 
the features of the IRF and provided a convenient analytic approximation for the PSF in the 
form of the King function, and we argued that we could ignore the effects of energy dispersion 
at the cost of some bias to our measured parameters. Finally, we provided an expression 
for the log likelihood appropriate for Fermi-LAT (Eq. 13.13^ and for the expressions for 
the observed rates from diffuse sources (Eqs. 13.101 and 13. lip whose evaluation shall be our 
endeavor in the next chapter. 
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Chapter 4 

THE POINTLIKE PACKAGE: DESIGN AND IMPLEMENTATION 

As outlined in Chapter [U hkehhood analysis is the optimal method for characterizing 
7-ray sources. (We discuss in Chapter [6] likelihood techniques for source detection as well.) 
Unfortunately, likelihood analysis is complex. One must manage source models, descrip- 
tions of the instrument, and data. Forward folding the models into "data space" can be 
computationally intensive. Analysis for a single source (which, due to the PSF-broadening, 
entails analysis of many sources!) can be time consuming. All-sky analysis requires large 
computer clusters to handle the computational burden. However, by binning the data with 
scalable pixels and sacrificing some accuracy for time-saving approximations, this situation 
can be ameliorated. In this section, we describe the pointlike software package, a collection 
of routines for performing maximum likelihood (ML) analysis of Fermi-LAT data, which 
aims for the canonical goal of "making easy things easy and hard thing possible" . 

At the highest level, pointlike is a collection of Python modules designed to be used both 
interactively (for analyzing a single source) and in batch mode (for analyzing many sources). 
pointlike also relies internally on a large C-|— |- codebase via SWIcQ, a tool providing wrapper 
code to access compiled libraries from interpreted languages. The interface and organization 
are more properly left to documentation of the code, and in this chaper, we discuss the "nuts 
and bolts" of the implementation. 

4.1 Fermi-LAT Data 

For all the complexity of the instrument, the high-level Fermi-LAT data is simple, a list of 
meausured quantities from reconstructed events. In the following section, we provide details 
about this list, our method for binning the events, and some necessary cuts that must be 
made to reject background. 



http:/ /www. swig. org 
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4-. 1.1 High-level Data Products 

Information about each event is telemetered from the S/C to a batch farm at SLAC for 
event reconstruction, in which the low- level hardware readouts are converted into estimates 
of physical quantities [22]: the energy, position, and time of arrival of the putative 7 ray. 
There are two high-level data products. The FTl file is an event list in FITS format with 
columns containing reconstructed quantities (energy, R.A., Decl., time, etc.) for each event. 
The FT2 FITS file tabulates the position and orientation of the S/C at 30-second intervals, 
information needed to construct the time-dependent IRF. 

4.1.2 Binning the Data: HEALPix 

Binning Fermi-LAT data is not a trivial task. Due to the energy-dependence of multiple 
scattering in the tracker, the LAT's angular resolution varies by nearly two orders of mag- 
nitude over its energy range (Figure [3]). If one uses only a single size for position bins — e.g., 
the EGRET choice of 0.5° x 0.5° [66] — then one faces a tradeoff: lose information at high 
energies by using bins much larger than the ultimate instrumental resolution, or increase 
the computational burden by using bins much too small for the poor angular resolution at 
low energies. 

We address this problem with IIEALPix[53j, a scheme for tessellating the sky with equal- 
area pixek^. The base tessellation comprises 12 pixels, and finer binnings are achieved by 
subdividing the base pixels; if there are A'sije subdivisions on a side of the base pixels, 
there are 12 x N'^:^^^ pixels. A'side thus controls the granularity of the pixelization. (A useful 
relation is that a pixel is approximately 60°/A''side on a side.) The finest binning possible is 
limited by computer architecture, as each pixel must be uniquely mappable to an integer. 
We enforce A^side < Siofl yielding a maximum resolution of about 30", more than adequate 
for the finest resolution PSF of the LAT. 

In addition to the finely-controlled pixel size, HEALPix facilitates sparse binning. Due 

^HEALPix also describes a particular projection used in demarcating the pixels. We do not make explicit 
use of this feature. 

^8192 is the largest power of two for which 12 x A'^d^ < 2"^^, the largest value of a signed integer available 
on 32-bit architecture. 
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to the falling spectra of 7-ray sources and the flat effective area of Fermi-LAT , there are 
relatively few high energy photons and most pixels are empty. By assigning each pixel a 
unique index, the data can take the form of a mapping of index to counts, and we need only 
store those indices with non-zero counts. 

Data are initially binned into front-converting (those that pair produce in the forward 
section of the TKR with the thin tungsten foils) and back-converting (producing the first 
pair on a thick tungsten foil deep in the TKR) events. This distinction is maintained 
throughout the pointlike package, and a distinct IRF is used for each class. Data are next 
divided into uniform bins in logarithmic observed energy. (Logarithmic energy is the natural 
choice for broadband spectra, which often look like power laws.) By default, we use either 
4 or 8 bins per decade, with the left edge of a bin at 100 MeV. Such energy bins align with 
that used in CALDB for the PSF parameters and allow the use of the same PSF parameters 
over an entire bin width. 

For each energy bin (and conversion type), the N^ide parameter is chosen so that the 
pixels are somewhat smaller than the core of the PSF at that energy. Precisely, 

8192 

side - ^ ^ 8192/iVo X (£;/100MeV)-0-8 x exp -(£;/2000MeV)2 ' ^ ' 

with A'o = 86 (A^o = 52) for front (back) events. This binning puts roughly 5 pixel widths 
within the 68% containment radius of the PSF at the given energy, up to about 1 GeV. 
At 1 GeV, this binning rapidly asymptotes to Aside = 8192. At 1 GeV, the PSF-based 
pixels are sufficiently small and the data sufficiently sparse that the mean pixel occupancy 
drops to 1. We gain no computational savings by using PSF-based pixels, and so use the 
smallest pixels possible. Above 1 GeV, the pointlike binning becomes essentially unbinned 
in position. (The binning in energy persists, of course, as this binning allows us to define a 
coarse-grained PSF.) In Figure 14.1.21 we show the mean pixel occupancy for 18 months of 
data using the scheme in Eq. 14.11 The compression ratio at low energies is large, especially 
for back-converting events, and will grow linearly with time. (Essentially every pixel at low 
energies is occupied.) For all energies, the total compression ratio is about 2.8. 

The "unbinned limit" is relatively robust against loss of efficiency with increasingly 
large data sets. The primary worry is if, by selecting very small pixel sizes, we prevent the 
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Figure 4.1: The compression ratio of the data, i.e., the mean photons per pixel, for a given 
energy band. The occupancy drops rapidly as the energy approaches 1 GeV due to the 
improving PSF and falling spectra of astrophysical sources. 
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mean occupancy of a particular band from increasing above 1, thus gaining performance. 
However, the generally steep spectra of GeV sources and the sharpening PSF conspire to 
prevent this. At 1 GeV, for instance, the scheme of Eq. 14.11 gives A'side = 644 (401) for 
front (back). These are approximately PSF-sized pixels, and there are 4.98 and 1.93 million 
pixels, respectively, for these A'^gide settings, while for an 18-month data set, only 0.36 and 
0.33 million of these pixels, respectively, are occupied. Even in the extremely conservative 
limit that the number of occupied pixels grows with time, we must collect 9 years of data to 
reach a non-sparse pixelization for 1 GeV. A fortiori for higher energies with fewer photons 
and smaller PSF-based pixels. 

During the binning process, we make additional cuts. An important facet of the binned 
analysis is that the PSF not depend too much on incidence angle, so that a "collective" PSF 
for a bin makes sense. This is generally true (see Figure l4.1.2p : the variation of the PSF 
with incidence angle is on the order of 10/^. We typically reduce the variation even more 
by discarding events with a reconstructed incidence angle > 66.4°. Thus, a conservative 
choice of a pixel size appropriate to the on-axis PSF will remain efficient for all incidence 
angle. 

Another cut is made to remove most of the 7 rays produced by cosmic ray interactions 
in Earth's upper atmosphere [1]. Ultrarelativistic cosmic rays lead to a 7-ray cross-section 
in the Earth frame strongly peaked in the forward direction, and so from the point-of- 
view of the S/C, the intensity is approximately proportional to the total grammage of the 
atmosphere along the line-of-sight, i.e, the emission is dominated by the limb of the earth. 
The polar angle of the limb as measured from the zenith is given by 

180° - arcsin ( ) , (4.2) 



about 113° from the zenith for Fermi's altitude of 550km. This number varies slightly (linear 
in the altitude) during the orbit, which is slightly elliptical, and decreases as the S/C slowly 
loses altitude to atmospheric drag. Although the position of the limb is independent of the 
S/C orientation, more albedo photons are successfully reconstructed when more of the limb 



^Although the PSF containment radius increases to up to 50% of the on-axis value, the effective area is 
approximately proportional to cos(6) and very few events are collected with this large PSF. 
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Figure 4.2: The PSF — characterized by the radius in which 68% events are enclosed — as 
a function of incidence angle, energy (in GeV), and conversion type ("CT" in legend). 
Generally, the angular resolution is best on-axis and degrades as the edge of the field-of- 
view is approached. Using a larger containment radius, e.g., 95%, as a metric yields nearly 
the same dependence on incidence angle and magnitude of shift. The standard cut on 
reconstructed incidence angle for pointlike of cos{9) = 0.4, or > 66.4°, is indicated by the 
vertical black line. 
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is in the field of view, i.e., when the S/C is at larger rocking angles. At energies > 100 MeV, 
the reconstructed zenith angle of albedo photons is generally > 105° (see Figure [3. 1.2p : only 
events in the tails of the PSF at the lowest energies "disperse" more than 8° from their true 
directio We therefore apply a cut on the reconstructed zenith angle > 105°. 

Since the cuts on incidence and zenith angles use the reconstructed photon position, 
some events that would be cut if the true position were known are included, and others 
that would pass a cut in the limit of perfect reconstruction are discarded. In addition to 
such contamination, a small bias results. For instance, since the effective area increases 
approximately linearly with the cosine of the incidence angle, a cut on incidence angle will 
necessarily discard more events whose true incidence angle is less than the cut value. Since 
the limb tends also to be on the edge of the FOV, similar remarks apply. These effects are 
small, and we will ignore them. 

Finally, we only allow events that lie within a Good Time Interval (GTI). These GTI 
form an additional table in the FTl file and are simply a list of tuples with the start and 
stop time of an interval during which we accept events. By default, the GTI are determined 
by SAA passages. (This cut is trivial since no data is taken during SAA passage.) However, 
additional temporal cuts may be made based on the S/C position and orientation. For 
instance, as the S/C rocks, it may overshoot its target rock angle and spend time at a 
larger rock angle with higher albedo background. Similar excursions are caused when the 
S/C makes a pointed observation. Such times can be excluded from the GTI. Gamma-ray 
bursts (background for other sources) may also be so excluded. Finally, when studying a 
particular source, in order to further reduce contamination from albedo events, times when 
the ROI is too close to the limb can be excluded. 

4.2 Point Sources Rates 

The data, whose treatment we have just discussed, and the source models (discussed in 
Chapter [3]) form the "bookends" of the likelihood analysis, and we have now to link the 

^In fact, a nontrivial — relative to astrophysical source rates — number of limb photons with energies of 
order 100 MeV survive this cut. The remaining photons contribute a non-isotropic foreground to astro- 
physical sources, and should be accounted for in background modeling. [The contamination is increased 
with a larger rock angle.] 
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Figure 4.3: Reconstructed events (P6 "diffuse" class) with 100 MeV < E < 100 GeV in a 
10° aperture centered on (l,b) = (0,-45), integrated from Aug. 4 2008 to Jul. 18 2010. The 
binning is in the reconstructed zenith angle. The effect of the PSF is clearly seen in the 
broadening of the distribution of limb photons about ~ 113°. 



43 



two by performing the integrals in Eq. 13.131 and folding the intrinsic source rates through 
the instrument response function. These integrals represent the primary computational 
(and infrastructural) difficulty in likelihood spectroscopy, and we devote some effort making 
time-saving approximations when possible while maintaining good overall accuracy. 

There are two fundamental source classes: those that can be spatially resolved (diffuse 
sources) and those that cannot (point sources). The former includes the Galactic diffuse 
background due to cosmic rays, the extragalactic background due to distant sources below 
detection threshold, nearby galaxies such as the Magellanic clouds, supernova remnants, 
and some pulsar wind nebulae. The class of point sources includes the active nuclei of 
galaxies, pulsars, and small or distant remnants and nebulae. Since diffuse sources require 
convolution with the PSF, the two classes are treated independently, and we discuss point 
sources and diffuse sources in turn. 

We begin with Eq. 13.111 which we recall is the differential (in phase space) expected 
rate from a point source, neglecting energy dispersion. To compare this to observations, 
we need to integrate it over each of the bins we have selected for the data. We begin with 
integration over observed energy and time, i.e., we calculate the quantity 

ri{n') = j dE' j dt'T{E',t';X)A[E',cose{t)]e{E',t')fp^{[n';no,cose{t),E'], (4.3) 

the expected rate per unit solid angle in the ith. bin. (We leave the particular energy and 
time abstract; the bounds of the integrals over energy and time are implicit and extend over 
the ith bin.) For brevity, we suppress the prime notation in the sequel. 

4-2.1 Integral Over Time 

In order to evaluate this expression quickly, we make approximations and assumptions that 
allow us to pre-compute some quantities and simplify the integral. First, we assume that 
F{E,t\X) = F{E;\), i.e., that the source is stationary during the time range specified by 
the bin. Alternatively, we can view this assumption as measuring the time-averaged flux 
from the sourcqj. If we are interested in capturing the time dependence of the source, we 
simply use multiple time bins and proceed as before. 

®In fact, due to the time-dependent IRF, these statements are only equivalent to first order. 
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Next, we write e{E, t) = eo{t)+ei{E, t). eo is the dominant contribution to the efficiency, 
determined by the deadtime (trigger rate) and is energy independent, ei is the second-order 
correction in which the reconstruction effectiveness changes based on the charged-particle 
background. To a very good approximation, this correction depends hnearly on the trigger 
rate, eo, i.e., ei{E,t) = fi{E)eo{t) + f2{E), and we can reconstruct the full quantity from 
the simp le-to- measure eo. 

Now, we break up the integral over time into sub-integrals, 

N 

dt^Y. (4.4) 

i=i 

where each integral is over one of the (typically) 30-s intervals laid out in the FT2 file, and 
there are such intervals in the parent bin. The sampling rate of the FT2 file is designed 
to be sufficiently high that the S/C orientatiorLj does not change much over the course of an 
interval. Therefore, we evalute each sub-integral with a central-value approximation, and 
the time integral becomes 

N 

Ati A[E, cos 9{ti)] e{E, U) /p^f [ll; llo, cos e{ti), E], (4.5) 

1=1 

with ti the center of the zth FT2 bin and A t is the bin width, typically 30 seconds. 

Next, we discretize cosO, typically into 40 bins spanning from 0.2 to l.Qj. The time 
integral becomes 

Yl ^^i^cose, [cos 9{ti)]A[E, cos OiU)] e{E, U) /p^f [O; llo, cos e{ti),El (4.6) 
j=i i=i 

with Icose^ [cos 0(tj)] the indicator function that evaluates as 1 if cos9{ti) is in the jth 
cos 9 bin and otherwise. We define the livetime, Tj{E) = X^^Li ^ti^cosejS{E,ti) and the 
exposure, ej{E) = Tj{E) x A[E, cos 9j], in terms of which the time integral becomes 

Y^jiE)fpsd^;no,cos9j,E], (4.7) 

^This assumption is only marginally accurate when the S/C slews from its north-pointing attitude to 
its south-pointing attitude, but such a slew only occurs once per orbit, and < 10% of the livetime is 
accumulated during a slew. 

*^The LAT field of view is cos 6* > 0.2. 
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or in terms of the summed exposure, 

e{E)^ej{E)/e{E)fp,f[n;no,cose,,E]. (4.8) 

We define the exposure-weighted PSF, /„psf (ll; (Iq, E) = ej{E)/e{E)fp^i[^- IIq, cos O^.E], 

and the time integral becomes, finally, 

e(i?,f1o)/wpsf[f1;^1o,i5^]- (4.9) 

The immense benefit of all of this definition legerdemain is that the livetime calculation is 
entirely independent of the source. We can pre-compute the livetime for some pixelization 
on the skjlfl and have it available for all subsequent analyses using the same data. The 
time integral is thus entirely eliminated in favor of a sum over ~ 40 bins in incidence 
angle, a great improvement over the original sum over the S/C pointing history containing 
potentially millions of terms! 

4-2.2 Integral over Energy 
The bin rate is now 

r0) = J dEF{E, A) e[E, Oq) /wpsf(f1; ^1o, E). (4.10) 

As it stands, the expression is still computationally burdensome, as we would need to 
evaluate the PSF at each data pixel over many energies. To further simplify, we appeal to 
the Mean Value Theorem, according to which J dx f{x) g{x) = f{x') J dxg{x), for some x' 
within the bounds of integration. Thus, if W6 dctcrininc the cippropria-te energy, -E'opt ? sit 
which to evaluate /wpsf; we can remove the PSF from the energy integral, obtaining 

rjin) = Up,i{n;Qo,Eopt) J dE TiE,X) e{E,Qo). (4.11) 

Unfortunately, it is not possible to find a unique -Eopt- Although within a given energy 
bin, the energy dependence of the PSF is entirely given by Eq. 13.81 the energy dependence 

bins are typical 
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of the King function depends on position, i.e., i?opt is a function of = arccos(r2 • r^o)- 
Indeed, for small 0^, / oc cr~^ oc E^^-^, while for large 9'^, f oc a^'^^'''^^'^ f« cr^ oc E^-^. 

Rather than introducing an additional angular dependence through E^pt, we seek E'opt 
that gives the best representation of the PSF over a wide range of source spectra and 
positions. The natural metric for this is the log likelihood, Eq. 13.131 We concentrate on a 
single source centered in the ROI and write the expected counts for a given band as 

JdnJ dET{E, A) e{E, Qq) U^sii^; Qq, E) (4.12) 

I "max I _f _f 

^TT de^ dEJ^{E,X)e{E,no)Up,{{e\E) (4.13) 







TT I dO^ I dEN{E)Up,f{e\E) (4.14) 





f6l2 



I max 

= / dO^gie^), (4.15) 
Jo 

where the symmetry of the centered source allows the trivial azimuthal integral and we have 
made a small angle approximation. We now define an approximate rate using the "mean" 
Mean Value Theorem, 



)2 

max 



ttJ d9' J dEN{E)Up,i{9',E) (4.16) 

=TT J "^^^ d9^Up,i{9^Eopt{9^)) X J dEN{E) (4.17) 

^TT I ™'^(i0Vwpsf(e',^opt) X / dEN{E) (4.18) 

max 

d9^f{9\Eopt) (4.19) 







Now, the log likelihood for the energy band becomes 

d9^g{9^)- 



fmax r''i2 

log£i=/ d9^ g{9^) -Y^nilog d9^g{9'') (4.20) 

d9^g{9^)x [I- log g{9% (4.21) 
where in the second line we have replaced the observed counts with the expected rate and 







47 



gone to the unbinned limill^l. In terms of the approximate rate, the log Hkehhood is 

log£2(^opt) = / ™'(^02A^/(0^^opt) X [l-logiV/(02^Eopt)], (4.22) 

JO 



and now we simply solve the integral equation log£i = log £2(-E'opt) for 



opt • 



mm; 



In practice, -Eopt is within a few percent of the geometric mean, -Egeo = V-^max x 
of the energy band, and we obtain a sufficiently good solution for -Eopt with a single Newton^ 
Raphson iteration, 

d log£2(Kpt 



^opt ~ £^geo + [log£l - log£2(-Egeo)] X 



1 

-(Egco) 



(4.23) 



dEopt 

We apply this procedure to obtain an estimate of E^pt for each energy band (and event 
conversion type). The results for a typical ROI (centered on the position of the Vela pulsar) 
are shown in Figure 14.2.21 We see that, for a variety of source spectra, -Bopt is within a 
few percent of -Egeo- The largest discrepancy is at energies of a few hundred MeV. Here, 
the effective area rises rapidly (Fig. 13. 4p . meaning counts in the band are shifted towards 
higher energies, yielding a better effective PSF than -Egeo implies. The same effect causes 
the dependence on spectral index: softer sources emphasize lower energies, shifting E'opt 
lower. The overall trend for an ii^opt lower than -Egeo can then be understood to have its 
origin in the 1/E (uniform in logarithmic energy) spectrum used to determine the PSF. In 
any case, at all energies, the Sopt for the different spectra are within a few percent of each 
other. Therefore, we adopt the omnibus prescription of calculating ii^opt for a photon index 
(r) of 2.0. We present empirical results showing the improvement of the PSF representation 
using the "i^opt" prescription in §5.1.11 

We now begin the final step. We have the expected rate per unit solid angle, and we see 
from Eqs. 13.131 and 13.151 that we need to integrate this rate over both the entire ROI and 
over each data pixel (the terms on the right- and lefthand side of Eq. 13.151 respsectively) . 
Recalling from the discussion on binning ( ^4.1.2p that we choose Ngi^if. such that our pixels 
are small compared to the PSF, we use a simple central value approximation to evaluate 
the righthand side terms. The lefthand side terms, the integrals over the ROI for a given 

^°The differential in the log term is not problematic, as it is multiplied by a second differential, and 
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Figure 4.4: Comparison of the geometric mean (^Egco = \/ -f/max ^ -^min) ^f energy band, 
with the optimal energy -Bopt as estimated by the log likelihood method described in the 
text. The binning is uniform in logarithmic energy with 8 bins per decade and separate 
bands for front- and back-converting events are shown. The relative difference of E'opt from 
Egeo for three different power law source spectra with photon indices 7 = 1.5, 2.0, 3.0 is 
shown. 
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energy band, become 



r(A) 



dEF{E,\)e{E,VtQ 



N{\) 



(4.24) 
(4.25) 
(4.26) 



Here, we have defined 0(-Eopt; ^^o); the overlap integral. The overlap integral is simply the 
integral of the PSF over the ROI for a particular source. It must be between and 1, and 
it depends only on the source position, the PSF for the energy band as obtained through 
the method discussed above, and the radius of the ROI. The payoff of the decoupling of the 
energy and position integrals we achieved by selecting -Eopt is evident when we consider the 
log likelihood for the energy band, 



= -Yl o(^opt, %) iVj(A) +Y.^i 12 /wpsf(^*; nj,Eopt)Nj{x), (4.27) 

j i j 

where we see that the dependence on A comes only through the A^(A) terms. Since only the 
A^(A) terms change during the likelihood maximization, we are free to pre-compute once 
and for all the overlap integrals and the data-pixel contributions for each source. We have 
already discussed the latter calculation; we now proceed with a description of the terms on 
the lefthand side of Eq. OTl 

Evaluating the Overlap Integrals 

While we make the central-value approximation to evaluate the PSF integral over the data 
pixels, the overlap integrals are more complex. First, we assume that the ROI is circular. 
Through this assumption and the azimuthal symmetry (for sufficiently long integrations) of 
the PSF, we can reduce the 2-d integral to a quadrature. Let the ROI center be denoted 
r^o and the source position be Vtg- By the symmetries of the ROI and PSF, the only 
relevant quantities are the arclength separating the source and the ROI center, denoted 
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5 = arccos 0,o ■ ^Is, and the radius of the ROI, R. In terms of these quantities, 

/f I dd^ 1 / 9^ \ 

dnUp,iin;no,Eopt) = lld4>—^Uk ( ^ j (4-28) 

ROI ROI 

= — duUk{u)= / F^k[u2m - F.^k[ui{<p)], (4.29) 

ROI ROI 

where we have introduced the exphcit weighted-sum of King functions for the PSF. (As 

discussed in ^3.4. 1^ some versions of the LAT IRF express the PSF as the sum of two King 

functions. The overlap integral is linear in the King function, so the derivation presented 

here carries over directly. Likewise, -F^^ is an exposure-weighted sum, subject to the same 

linear arguments.) F^/. denotes the (analytic) cumulative distribution of the King function, 

and we have achieved reduction to quadrature. All that remains is to determine the bounds 

of integration, i.e., ui (</>), U2{(j)), and the bounds of the azimuthal integral. 

For the case of a point source located within the ROI boundary, 6 < R, ui{(j)) = 0. 

We place the x-axis on the diameter extending through the 0, and the ROI center. The 

integrand is clearly symmetric about the x-axis, so 

0{Eopt,S<R,R) = - I dcpF^k[u2{4>)]- (4.30) 
71" Jo 

U2{4>) can be determined from plane geometrjj^: 

a^2u2{^) = - [5sin(</.)]2 - 5cos(0), (4.31) 

where a stands in for the a parameter for each relevant King function. 

For an exterior source, 5 > R, both bounds in u are non-trivial. We again place the x- 
axis on the diameter joining and the ROI center and again note the reflection symmetry. 
The (j) integral extends from the x-axis to the tangent joining J^q and the boundary of the 
ROI. That is, 

2 parccos{R/S) 

OiEopt,S> R,R) = - d^F^k[u2m-F^k[uim. (4.32) 

71" Jo 



"'^'^The assumption, of course, is that the ROI is sufficiently small that we may approximate the space as 
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The bounds of integration are again determined from simple geometry: 



t7x/2ni(0) = 5 cos((^) - Vii2 _ [5sin(0)]2 (4.33) 
V^M^) = S cos(,/.) + - ['^sin(0)]2 (4.34) 



Evaluating the Source Counts 

Recall that we defined N{\) = f dE F[E,\) e{E,9.Q) = f dE f{E). We evaluate this in- 
tegral with a composite Simpson's rule quadrature over logarithmic energy. If the energy 
range of the band runs from to e^, then 

eafiea) + etfiet) + ^ (2 + 2Xe2Z+i(i)) x e.^^*" V(ea5''-') 



where 6 = (efc/ea)^/^^ X(i) is 1 (0) for odd (even) integers, and Ng (an even integer) is 
the number of intervals into which we divide the band. Ng moderates the tradeoff between 
speed and accuracy. In Figure 14.2.21 we explore the accuracy of the integral of a power law 
source over the (typically) lowest energy band in a pointlike analysis. The effective area 
here is rapidly rising and bumpy (see Figure [3^ . making this integral a good test case. We 
see A'^s = 8 gives an error < 1%. Since the evaluation of these integrals is not currently a 
serious bottleneck in the pointlike code, we adopt the conservative choice Ng = 16. 

With a quadrature rule in hand, we can make one additional approximation for speed. 
We cache the specified energies Ei and factors (the Is, 2s, and 4s in Eq. I4.35P for each 
band. Further, since the exposure varies relatively slowly in position, over a typical (10°) 
ROI we can approximate the exposure for a particular source by the exposure at the center, 
making a first order correction by normalizing to the exposure value at -Egeo- We can then 
cache Wgi = ki Ei e{Ei,il.Q), with ki the Simpson's rule factor, and evalute the integral, for 
an arbitrary source at position 0, as 

j:^.^(i^.,Al. (4.36) 

Evaluating this expression is extremely fast, and the exposure approximation introduces 
negligible (<< 1%) error (see Figure l4.2.2p . 
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Figure 4.5: Two power law spectra {fi{E) oc E~^'^ and f2{E) oc E^^'^) integrated over the 
most problematic energy band in a typical analysis, 100-133 MeV in 8 bin-per-decade mode 
and 100-178 MeV in 4 bin-per-decade mode. The relative difference with the exact integral 
(as determined by numerical integration to machine precision) is shown as a function of the 
number of intervals used in the composite Simpson's rule. 



53 




Figure 4.6: The exposure variation over a typical ROI (here, located at the position of the 
Vela PSR.) The approximated exposure assumes the same energy dependence as that at 
the ROI center and corrects only for the normalization at the geometric mean energy of the 
band. The exact value accounts for the spatial variation at all energies. Relative differences 
are shown for point sources at a modest (5°) and extreme (10°) separation from the ROI 
center. The band chosen, 100 < E'/MeV < 178, has rapidly- varying effective area, making 
this a worst case. The error is < 0.5%. 
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4-2.3 Point Source Summary 

Our goal in this section was to calculate the contribution of a point source to a partic- 
ular energy band. This task entailed evaluating the point source model (convolved with 
the IRF) both over the sparse data pixels and over an entire spatial aperture. We made 
approximations that allowed us to define a single PSF for the energy band, fwpsfiEopt), 
in turn allowing a factorization of the source rate integral into a spatial component and 
an energy component. We reduced the spatial integral over the aperture to a quadrature, 
and we outlined a rapid method for performing the energy integral for each source. Taken 
together, these approximations generally retain percent-level accuracy but greatly speed up 
computation of the contribution of point sources to the likelihood. 

4.3 Diffuse Sources Rates 

We now come to the calculation of the rates of diffuse sources, those that have resolvable 
angular extension in the LAT data. We return to Eqs. 13.101 and 13.131 which outline the 
integrations to be performed. The argument parallels that for point sources. We again 
begin by making the assumption that the diffuse sources are stationarjQ. As for point 
sources, we introduce the exposure and exposure-weighted PSF to arrive at the following 
expression for the expected rate per unit solid angle of a diffuse source for a given energy 
band: 



This expression is similar to that for point sources, Eq. 14. 101 but with an additional compli- 
cation, the convolution of the diffuse source count rate per unit solid angle with the PSF. 
In the following, we outline the evaluation of the integrals over energy and position. 

The strategy differs slightly from point sources. Typically, we deal with only a couple of 
diffuse sources, the Galactic diffuse background from cosmic rays and the (approximately) 
isotropic background from unresolved sources at cosmological distances, irreducible charged 

^■^This is of course perfectly true for celestial diffuse sources. Photons from the Earth's limb, and from 
irreducible charged particle background, have some time dependence based on the S/C precessional phase 
and the active rocking profile. These effects are certainly second order. 




(4.37) 
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particle background, and contamination from the Earth's limb. With only two sources com- 
pared to tens or even hundreds of point sources, we can be more lavish with computational 
resources. Furthermore, all but the brightest point sources are utterly dominated by the 
diffuse background at low energy, particularly in the Galactic plane. To avoid bias, it is 
important to model the diffuse sources with as much accuracy as possible. 

4.3.1 Energy Integral for Diffuse Sources 

As with point sources, we adopt a composite Simpson's rule for the integral over energy. 
Adopting the same notation, 

Ns+l „ Ns+1 

r{n') -Y^Wi dnj^{Ei,n;X)e{Ei,n)f^,{{n';n,Ei) = Wirin',Ei), (4.38) 

i=l i=l 

where in this case Wi is the product of the Simpson's rule factor and Kj. The expression 
/ di}.J^{Ei,t',^;X)e{Ei,i},) f^psi{il';i},,Ei) is extremely costly to evaluate, so we are more 
cautious in selecting Ng. Indeed, since only the low-energy bands, with their bumpy and 
rapid rise in the effective area, require care, we adopt Ng = 8 for bands with E < 200MeV 
and ATj = 4 for all remaining bands. 

Unlike for point sources, we do not attempt to pull the PSF out of the energy integral for 
two important reasons. First, in keeping with our remarks above, we want to evaluate the 
integral as accurately as is feasible. Second, unlike point sources, the spatial morphology of 
diffuse sources can and does change with energy. Such morphology shifts within an energy 
band are likely to be small but are in any case explicitly accounted for in our formulation 
here. 

4.3.2 Spatial Integral (Convolution) for Diffuse Sources 
We now have to evaluate 

r{n',E) = J dnT{E,n;X)€{E,n)f^ps{{n';n,E). (4.39) 

The primary source of interest is the Galactic diffuse background. The construction of the 
model is beyond the scope of this work, but uses methods similar to those employed in the 
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analysis of [T4|. It is typically structured as a mapcube, a pixelized intensity map of the 
sky stored at a series of energy planes. We construct the continuous function J-'{E, J7; A) 
by first selecting the two image planes bounding E, determining the spatial intensity in 
the bounding image planes via bilinear interpolation of the four pixels closes to fi, and 
finally logarithmically interpolating these two values in energy. The isotropic background 
is typically tabulated, and we simply interpolate the table. The isotropic model acquires 
spatial variation through multiplication with the exposure. We write a general model as 
^{E,^}) = J^{E,(};X)e{E,Cl). Since the PSF only depends on the difference between the 
true and reconstructed positions, the spatial density is 

r(d') = j dnv{E,n)Up,iiE,n' -n), (4.4o) 

readily seen to be a two-dimensional convolution. 

Recall that a convolution of two functions, / and g, is defined in one dimension as 

+00 

{f*9){x')= I dxf{x)g{x' -x). (4.41) 



The convolution arises in probability theory as the probability density function for the sum 
of two statistically independent variables with probability density functions / and g. (The 
integral is over all combinations of values drawn from / and g whose sum has the correct 
value.) Now, suppose f or g is concentrated near the origin. The convolution is then well 
approximated with some cutoff: 

+x 

{f*g){x')^[dxf{x)g{x'-x). (4.42) 



~x 

We can evalute such an integral by sampling it at N uniform points between —X and +X 
and applying some quadrature rule. If we use points in the quadrature, then all of the 
information in the convolution can also be contained with N points. Thus, a full evaluation 
of the convolution over its support requires 0{N'^) operations. 

Similar arguments apply to the evaluation of a discrete Fourier transform (DFT) at its in- 
dependent frequencies. However, the use of the Fast Fourier Transform (FFT) algorithm|37j 
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allows evaluation of all independent frequencies of a DFT in only 0{N\ogN) operations. 
This development can be applied to convolutions via the Convolution Theorem: 

T{f*g)=Hf)^H9), (4.43) 

i.e., the Fourier transform of the convolution of / and g is equal to the product of the 
Fourier transforms of / and g. The quadrature sampling points map to N independent 
frequencies in a DFT. Thus, by evaluating the FFTs of / and g, multipyling, and taking 
the inverse FFT, we can calculate the convolution of / and g with 0(A^log A^) complexity. 

We now apply these arguments to the evaluation the convolution of a diffuse source with 
the PSF. To apply a 2D FFT, we need 'D{E, Q) and fwpsiiE, il) evaluated on a regular grid. 
The diffuse map, of course, is a curved space, so in making such a projection we make an 
approximation. However, since our grids are typically 10s of degrees on a side, a projection 
does not introduce too much distortion. We assume for now that we are working on the 
equator in the Galactic coordinate system. We use the plate carree projection such that the 
Cartesian coordinates of the projection x and y are simply / and b. (The carree projection 
is approximately equidistant near the equator; this is important, since the PSF depends 
only on the angular separation of two points.) We cut off the integral to a square grid with 
a side length of 2A. Then, the convolution becomes 

r{l\b',E) = 11 dnV{EJ,b) UpsdEj' -l,b' -b) (4.44) 
+A +A 

- I dl I dbV{E,l,b) UpsdE,{l' - I? + ib' - bf) (4.45) 

-A -A 
i=+N j=+N 

i=-Nj=-N 

The prescription is now clear: evaluate D and fwpsf on the N x N grid, perform a 2D 
FFT on each, multiply, and invert, and store the final N x N image plane containing the 
convolved diffuse model. By interpolating the pixels of this image, we can evaluate the 
convolved diffuse model for any il. within the grid, i.e., we can compute the rate in Eq. 14.381 
trivially. We note, importantly, that we normalize fwpsf fo the grid, with the net effect that 
"diffuse photons are conserved". This adjustment inevitably leads to some edge effects, as 
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at low energies some photons are scattered into/out of the ROI. We thus choose a grid 
size to minimize these effects on the model in the ROI itself. Currently, for each band, we 
calculate the 95% containment radius of the PSF and require the grid be 2 x {Rroi + R95) 
on a side. 

We note a few technical details. First, we are effectively rebinning the model of the 
diffuse source onto the uniform grid. Therefore, the pixel size of the uniform grid should 



of the Galactic diffuse uses 



be sufficiently small to avoid artifacts. The glLiem_v02 model 
0.5° pixels, so a sufficiently fine uniform grid size is 0.25°. Second, the above derivation 
relies on a convolution at the equator, where the carree projection is relatively distortion 
free and, more importantly, equidistant. In order to use this approach generally, we perform 
the following procedure for an ROI centered on 

1. Construct a uniform grid centered at (/,0). 

2. Right-handedly rotate each point on the grid by b° about the vector pointing to 
{I - 90°, 0) to get {lr,br). 

3. Evaluate 'D{E,lr,br) for each point on the rotated grid. 

4. Perform the FFTs and convolution. 

To evaluate the convolved background model at a particular point, say {l',b'), the process 
is reversed: rotate by —6° about {I — 90°, 0) and then linearly interpolate the grid values. 
(Another benefit of using the plate caree projection is the ease of indexing and interpolation, 
since the indices map directly to geographic coordinates with only arithmetic operations.) 

4.4 Calculating and Maximizing the Likelihood 

We have now described in some detail how the rates for point source and diffuse sources 
appearing in the likelihood, Eq. \3.1'6[ are calculated in the pointlike framework. Particu- 
larly, we have demonstrated how to calculate rates for bins in energy and event conversion 

^^http:/ /fermi. gsfc.nasa.gov/ssc/data/access/lat/BackgroundModels. html 



59 



type. We formalize this procedure in pointlike with the a band, an object (in the computer 
science sense) that associates data and source rates for a particularly energy range and 
conversion type and performs a series of tasks to link the two and facilitate computation of 
the likelihood. 



4-4- 1 The Band Object 



In terms of data, each Band maintains an array storing the positions ilj and counts Ui of 
the occupied pixels in the ROI that fall within its energy bounds. 

Next, the Band is responsible for pre-computing some quantities. It causes E^^t (Eq. 
I4.23P for its energy range/conversion type to be determined and, from this, determines and 
stores the proper (approximate) PSF fwpsf{Eopt) introduced in Eq. 14.111 It determines 
and stores the Simpson's energies Ei and Simpson's weights, Wi, from Eq. 14.361 For each 
point source, referenced by index j, a Band computes and stores, the overlap integral, Oj 
(Eq. 14.24p . and the exposure correction, aj = e(l GeV, i1j)/e(l GeV, llo) appearing in Eq. 
14.361 Using f^psf, it evaluates fij, the (normalized) contribution of the jth source to the 
ith pixel, i.e., fij = /wpsf (^opt, ^i] ^j), appearing in Eq. 14.271 

To actually evaluate the log likelihood, we provide a A. Let Aj be the subspace of A that 
parameterizes point source j. The Band then evaluates the source rate using its cached 
Simpson's rule information: 

Nj{\j) aj / dETj{E, Xj) e{E, flo) » Yl ^0 (^•^'') 



aj X Ws • J^jiE, Xj 



(4.48) 



Ignoring diffuse sources for the moment, the log likelihood for the Band is then 

urccs 



iog/:(A) 



+ 



i=i j=i 



-O ■ N{X) +n-log f-iV(A) 



(4.49) 
(4.50) 



We have written the above equations for the source rate and log likelihood both as sums 
and as vectors to make clear (a) what is being done operationally and (b) what is being 
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done internally. It is easy to appreciate, especially in the vector notation, the computational 
savings gained by factoring the spatial and energy terms. When we update A, all that is 
required to update the likelihood is a handful of vector arithmetic operations! The vector 
notation is also schematic for how the computation is carried out in the Python source 
code. Despite Python's interpreted nature, it ispossible with numpil^^ to run vectorized 
code at nearly the same speed as compiled code^i. We thus strive to formulate vectorized 
implementations whenever possible. (To avoid confusion, we note that, from left to right, 
the three inner products above are in "source space" , in "pixel space" , and in "source space" , 
respectively.) A second benefit to this notation is it makes determination of the gradient of 
the log likelihood, dlog C{X)/dX, more tractable, as we shall see below. 

Band objects also manage the likelihood calculation for diffuse sources in a manner 
similar to that described above. Recall that, after constructing an image of the convolved 
diffuse model, we can evaluate the quantity 

r{n',E) = J dnT{E,n;X)e{E,n) Up,i{n';n,E) (4.51) 

for a given diffuse source by interpolating from the convolved model values evaluated on 
our uniform grid. To integrate the rate over the ROI, we simply average all pixels on the 
uniform grid lying within the ROI, i.e., 

dn'riCl', E) « ^ "^^^ V li^Roi di, (4.52) 

arid ' 

-LidROI 

i=0 

where di is the the model value for the ith. grid pixel and we use the indicator function X to 
select only grid pixels lying within a circle of radius Rroi- ^^i^) is a scaling model which 
evaluates to 1 for all energies at the "default" parameter values. We integrate both D(E) 
and r{n',E) over the band using Simpson's rule analogously to the procedure for point 

^*http:/ /numpy. scipy.org 

^^It is also possible to extend the Python language with functions written in C, but numpy typically 
suffices. 
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sources and define 



D = J dED{E), 
d{^, A) = j dE r(pl, E)N{E, A). 



(4.53) 
(4.54) 



By using fixed Simpson's rule points, we are free to pre-compute and store the quantities 
r{Q,E) and / dO! r{p! ,E), and updates from the likelihood will only require a sum with a 
few terms, viz. the Simpson's rule sum. With these quantities, we now have the fuh Band 
likelihood including point sources and diffuse sources: 



iog/:(A) 





\_j= 


1 


_i=i j=i 










6 ■ N{X) - 




+ n • log 





Hi log J2 ^ji^j) fij + E '^^j-^^) 



(4.55) 
(4.56) 



where Npg {N^s) indicates the number of point (diffuse) sources. We leave the sum over 
diffuse source explicit since there are typically only a few such sources modeled. 

4-4-^ Calculating the Likelihood Gradient 

We preface discussion on likelihood fitting with an important ingredient, an "analytic" 
calculation of the gradient — as opposed to finite differences — that provides significantly 
faster maximization of the log likelihood and determination of its curvature. The calculation 
of dlog jC{X)/dXi is complicated by the many-to-one mapping of parameters to sources. We 
introduce the auxiliary matrix M defined by = dNj{X)/dXj. This ATgource x ^param 
matrix will only have nonzero entries where the parameter in the denominator matches a 
parameter of the source in the numerator. Then, ignoring diffuse sources. 



Vlog>C(A) = -0-M + 



n 



f • A^(A) 



[f.M], 



(4.57) 



where by n/{f ■ N{X)) we mean elementwise division of the numerator by the denominator. 
We note that the denominator of this term has already been determined in the log likelihood 



62 



calculation. Thus, in practice, we first build up M, requiring Nparam Simpson's rule quadra- 
tures and some bookkeeping, and the remainder of the gradient can be computed with a few 
matrix operations. Terms arising from the diffuse sources can be incorporated analogously. 
This method should be compared to > 2Nparam likelihood evaluations required to estimate 
V log£(A) using finite differences; the "analytic" approach is also free from numerical errors 
resulting from inappropriate step size choice. 

4-4-3 Spectral Models 

Up to now, we have been fairly abstract about J^(E,X), the source flux. A variety of 
spectral models are available in pointlike, including those in most common use, a power law 
and a power law + exponential cutoff. Using the computer science principle of inheritance, 
particular spectral models inherit from a single base class. The base class manages all 
common tasks, while the subclasses implement the methods particular to the spectral model, 
e.g., the density, dN/dE{E,X), and the gradient, d{dN/dE{E,X))/dX- 

Parameters for spectral models are almost universally positive definite, and negative 
values are undefined or unphysical. To prevent the fitting algorithm from attempting these 
disallowed values, we perform a logarithmic transformation of all parameters internally. 

4-4-4 Maximizing the Likelihood and Estimating Errors 

The default fitter for pointlike is the implementation of the BFGS[70j algorithm in scipu^^ 
via fmin_bfgs. This algorithm uses the computation scheme for the log likelihood and its 
gradient outlined above. If a spectral model is included for which it is difficult to implement 
dlogC{X)/dXi, then the downhill simplex algorithm — which makes use of the log likelihood 
only — implemented in scipy via fmin is also available. 

We estimate the errors from the information matrix, the inverse of the hessian of the log 



'http://www.scipy.org 
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likelihood function. That is, the covariance matrix is estimated by 



(4.58) 




9^ log C 
dXidXj 



(4.59) 



If the analytic gradient is available, we employ a first-order finite difference scheme, while 
in the absence of a gradient we perform second-order finite differencing of the log likelihood 
function. In both cases, we use an iterative process that attempts to find an ideal step size 
for estimation of the curvature, i.e., one for which the change in the log likelihood is of order 
unity. 

4-4-5 Source Localization 

While we have concentrated on spectral parameters, the source position is also a model 
parameter and we can find its MLE in a similar process. Whereas the approximations we 
have made allow us to fit spectral parameters for point sources and diffuse sources without 
re-evaluating model values for pixels (e.g., f in Eq. I4.55p . this is manifestly not the case for 
source localization since we must obviously re-evaluate the row of f corresponding to the 
source being localized as its position is updated during ML fitting. To speed up this process, 
we adopt an iterative process for source localization. An initial spectral fit is performed, and 
then the spectral parameters are taken as given while the position of the source in question 
is varied to maximize a two-dimensional likelihood function (i.e., two angular coordinates). 
The improved position can be used in a second spectral fit, and so forth until convergence. 
The only drawback to this approach is a potential slight underestimate of the positional 
uncertainty since correlation with spectral parameters is not taken into account. However, 
tests of the localization algorithm presented in Chapter [5] indicate such an effect is negligible. 



Let 6 = f • N{X) + dW - fs Ns{X) (see Eq. Ii35]l . i.e., the rate in each pixel for ah 



sources under the best-fit spectral model with the rate for one point source, labeled "s", 
subtracted. This quantity is constant when the position of "s" changes. The log likelihood 
is then 



logC{n) = -Os{n)Ns + n-log b + fs{^})Ns 



(4.60) 
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To maximize the function, we fit a quadratic form to the log hkchhood surface via least 
squares and iterate until the position change is sufficiently small. The final quadratic form 
measures the likelihood surface and provides an estimate for the uncertainty in the two 
position parameters. 

Finally, we note that this formulation is independent of the spectral model used for 
source "s", and while we typically employ one of the usual spectral models, e.g., a power 
law, we can also use a "model-independent" approach outlined below. 

4.4-6 Model-independent Fits 

In the previous sections, we have concentrated on the Band. Of course, when fitting a 
(broadband) spectral model, we accumulate the log likelihood contributions from each Band 
to compute the total likelihood. However, we may also be interested in the likelihood from 
a single band. The typical use case is assessing the shape of the spectral energy density of a 
source, e.g. to look for deviations from the broadband model. In this procedure, we model 
all sources initially with a broadband spectral model and maximize the likelihood to arrive 
at a good spectral model for the ROI. We then freeze the parameters for all sources save 
source "s". We assume that, within the band, the source has a power law spectrum with 
r = 2.0. (Since the bands are narrow, there is little dependence on the particular slope 
chosen.) Then, the log likelihood for the source flux within the band is given by 



jC{J^s) = -Os ^ J^sn ■ log 



(4.61) 



with Fqs the flux calculated for source "s" in the initial spectral flt. This single-dimension 
function is easily extremized by differentiating and finding the root of the resulting equation. 
Thus, J-g can be rapidly estimated in each band, yielding a band- by-band estimate for the 
spectral energy density for the source. These estimates can be used for localizing sources 
as outlined in the previous section, or for generating plots of the spectral energy density. 
For the latter application, we typically use the joint likelihood for the Band objects for 
front-converting and back-converting events at the same energy. 

In this prescription, the only sensitivity of the final band-by-band fits to the Ansatz 
broadband model comes from the other sources. Provided that model is not too far wrong — 
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or that source "s" is not too bright — we expect the initial model to have little effect on the 
band-by-band estimates. 

4.5 Summary 

In this chapter, we laid out the major design principles and detailed the implementation of 
pointlike, a full-featured maximum likelihood analysis package. We first outlined our method 
of binning the data with pixels that scale with the PSF to achieve good performance at low 
energy and good accuracy at high energy. We next began the "heavy lifting" of evaluating 
the source rates folded through the Fermi-LAT IRF by distinguishing point sources and 
diffuse sources. For point sources, we defined an effective PSF for each band by estimating 
an optimal energy at which to evaluate the King function parameters, and we used this PSF 
to determine point source contributions to data pixels and in the overlap integrals, which we 
reduced to quadrature. For diffuse sources, we described a method for evaluating the diffuse 
model on a locally flat grid and then applying the Convolution Theorem to quickly evaluate 
the convolution with a two-dimensional Fast Fourier Transform. Finally, we described how 
these ingredients are unified in a Band object to facilitate calculation of the likelihood and 
its gradient. 

Along the way, we attempted to validate each piece of the calculation and verify that 
it met our goal of percent-level accuracy. In the next chapter, we test the entire pointlike 
package for actual science tasks, viz. estimating the spectral and positional parameters for 
sources under a variety of conditions. 
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Chapter 5 

THE POINTLIKE PACKAGE: VALIDATION 

Although in Chapter |4] the individual portions of pointlike have been considered and the 
efficacy of the various approximations tested, the most important check is a holistic one — 
the determination of spectral parameters from realistic data. Although some bright 7-ray 
sources can be considered standard candles, Fermi-hAT is the first HE 7-ray observatory 
with good sensitivity above 1 GeV, and comparison to past experiments can only serve as 
a sanity check[5]. Since we are concerned with percent-level effects, we instead use data 
generated with Monte Carlo simulations using a method we outline below. 

As described in previous sections, the LAT (as modeled) is entirely characterized by its 
IRF. By combining a simulated or actual history of the S/C position and orientation {FT2 
file) with the IRF and a source model, a realization of data from the sources can be created. 
This task is implemented in the gtobssim tool developed by the LAT Collaboration. We 
make extensive use of it below. Except where otherwise specified we use the P6-v3-diff IRF 
for simulating and fitting validation data. Additionally, except where otherwise noted, we 
use the true energy of the simulated photon, i.e, we turn off the effects of energy dispersion. 

The majority of LAT sources are modeled as power laws. While the brightest sources 
may require additional degrees of freedom for accurate representation, dimmer sources are 
adequately described, from a statistical standpoint, by power laws. For the dimmest sources, 
even two degrees of freedom may be too much. Thus, while we do test more complex models, 
especially power laws with exponential cutoffs given their strong connection with pulsars, 
we concentrate here on power law spectra. 

We begin by simulating a diffuse background. We adopt the model used for the IFGL cat- 
alog analysis jl2j. the glLiem_ v02 mapcube for the Galactic diffuse and the isotropic-iem_v02 
intensity tabulatioJl for the isotropic background. We simulate 1 year of data using the 



^http;/ /fermi. gsfc.nasa.gov/ssc/data/access/lat/BackgroundModels. html 
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actual FT2 file describing the Fermi-LAT position and orientation during the calendar year 
2009. Simulating diffuse sources is a slow process, and 1 year of data occupies about 1 GB 
of disk space. Therefore, for validation requiring multiple realizations of the same source 
configuration, we simulate photons in a 12° disk (radius) about the position of the Vela 
pulsar, (R.A.,Decl.) = (128.8463, -45.1735). 

Next, we simulate a point source with a power law spectrum, an integral flux from 100 
MeV to 200 GcV of 10^^ ph cm^^ s^^ and a photon index of either 1.5, 2.0, or 3.0, spanning 
the typical spectrum of Fermi-LAT source spectra. For convenience, we define J-x = 
10-^ph cm s and let indicate a general flux. A source with = J% is comparable in 
brightness to the Vela pulsar, the brightest steady 7-ray source in the sky. We are interested 
in the performance of pointlike over a range of source fluxes or, more importantly, signal- 
to-background ratios. Strong sources expose inaccuracies when statistical fluctuations are 
unimportant, while weak sources test both statistical issues (e.g., error estimate) and bias 
from strong backgrounds. Rather than simulate different data sets for each flux, we resample 
a subset of photons from the original J-5 file using the following prescription: 

1. Draw a Poisson random variable Ntar from a distribution with rate NtotJ^/T^. 

2. Select Ntar photons uniformly from the original file set. 

Here, Nfot is the number of photons in the original data set and T is the target flux. Note 
that we make no cut on incidence or zenith angle. 

We simulate 20 base sets with T = T5 each with an integration time of 1 year, and from 
these we can generate subsets for flux-dependent study of ensembles of sources. But before 

we proceed to ensemble testing, we take advantage of the extreme precision allowed by 
summing data from many MC realizations to verify various aspects of the analysis outlined 
above. 

5.1 Validation with Summed Data 

While the 20 independent realizations are valuable for ensemble testing, by combining them 
into a single data set we can test pieces of the pointlike framework to high precision. When 
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combined, since we use the exposure for only a year, we are essentially using a year of 
data for a source with a flux of Likewise, the diffuse background is 20 times brighter 
than in actuality. The resulting quantity of photons, a few million, allows measurements of 
deviations of the model and data to a few percent. In the following sections, we examine our 
implementation of the point-spread function and convolution of diffuse sources. We conclude 
with a comparison of the maximum likelihood parameters estimated from the summed data 
with their Monte Carlo truth values. 

5.1.1 Checking the Band PSF Shape 

We wish to verify two aspects of the approximate exposure- weighted PSF we defined for each 
energy band (see Eq. 14. lip . First, do we correctly characterize the shape of the distribution 
of photons for a point source, and second, do we accurately calculate the overlap integral, 
Eq. 14.241 ? We begin with the PSF shape. 

First, recalling the azimuthal symmetry of the PSF, in each band we bin the photons in 
9, the angular separation of the reconstructed photon position and the point source position. 
(Since we are using binned data, the reconstructed photon position is actually associated 
with a HEALPix pixel center, but this position differs negligibly from the actual position 
by design.) To achieve an approximately equal number of photons in each band, we invert 
the cumulative distribution function furnished by the on-axis PSF and use the resulting 
quantiles such that, e.g., ^ 5% of the photons are in each 9 bin. (We choose bins with equal 
counts, rather than bins with equal size in 9 or 9'^, so that the statistical weight of the bin 
on the results is easy to infer.) 

In calculating the expected number of counts in a particular bin, we must account for 
the "ragged binning" of the HEALPixels. To do this, we simply integrate the PSF over 
each 9 bin numerically by evaluating the PSF at each HEALPixel, occupied or not, in the 
9 bin, and take the mean. Since we are interested in the shape, we normalize the sum of 
the predicted counts over all 9 bins to the total observed counts. Importantly, then, these 
results are independent of how accurate the source model and estimate of the exposure are. 

The results for a power law source with F = 2.0 are shown in Figs. 15. 1.1} I5.1.H and 
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Figure 5.1: The PSF profile for bands witli front-converting events between 100 and 1000 
MeV. The x-axis gives the angular separation of the bin center from the point source position 
in degrees and is on a logarithmic scale. The y-axis indicates the relative difference, observed 
less predicted, in percent. These results are for a source with a power law spectrum of photon 
index 2.0. 
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Figure 5.2: As Figure [5.1. 11 for bands with front-converting events between 1000 and 10000 
MeV. 
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Figure 5.3: As Figure 15.1.11 for bands with front-converting events between 10000 and 
100000 MeV. 
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r = 1.5 


r = 2.0 


r = 3.0 


Alog£ (Front) 


287 


206 


82 


A log £ (Back) 


539 


378 


156 



Table 5.1: The change in the log likelihood for 20 combined years of Monte Carlo data for 
three different spectral shapes when comparing band PSFs based on Egeo, the geometric 
mean energy for the band, and Eopt, the optimal energy estimated by Eq. 14.231 

I5.1.1[ These results, for front-converting events, are representative of results for T = 1.5 and 
r = 3.0 and for back-converting events. Generally, the outcome is quite good. For many 
bands, the residuals are perfectly flat to within the statistical error bars. Others show a 
slight trend, i.e., the PSF is too narrow (broad), resulting in negative (positive) residuals 
in the core and positive (negative) residuals in the tail. This is to be expected, as the band 
PSF is only approximate. 

We can also investigate what gains we made by optimizing the energy at which we 
evaluate the scaled a parameters for our band PSF -Eopt (see §4.2.2p for our band PSF. 
We compare in Table 15.11 the log likelihoods for the three spectral shapes using both -Eopt 
as estimated by Eq. 14.231 and the simple prescription of E'opt = Eg^o, i.e., choosing the 
geometric mean energy as the optimal energy. There is an appreciable increase in the log 
likelihood with Sopt as estimated from Eq. 14. 231 

5.1.2 Checking the Overlap Integrals 

Next, we compare the total number of counts predicted by the model to that actually 
observed in the entire ROI. The accuracy of the predicted counts depends on the value of 
the exposure, the proper integration of the spectral model over the energy band, and the 
correct calculation of the PSF overlap. The exposure is typically binned with 1° pixels, and 
varies sufficiently slowly that the resulting error is < 0.5%. Likewise, we have shown in 
Figure [3. 2. 21 that the integral of the spectral model introduces negligible error. The primary 
source of error is then the calculation of the overlap integral, which in turn comprises two 
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error sources. Numerically, there is neglibible error, but since we use a single, approximate 
PSF for the band, the overlap integral will invariably depart from the true overlap. Second, 
while we calculate the overlap integral assuming a circle aperture, the use of binned data 
results in an actual data set with a "ragged" edge. 

We test these effects with the production code by again summing all 20 years of Monte 
Carlo data and comparing the observed and actual counts in a 10° ROI. Since we are 
interested in the absolute accuracy of the model predictions, we do not renormalize the 
data or perform an initial spectral fit. The results appear in Figures 15.1.21 and 15.1.21 and 
indicate overall accuracy of about ly^. 

5.1.3 Checking the Convolution 

The accuracy of the diffuse convolution is extremely important, particularly in the Galactic 
plane, because the the diffuse counts dwarf counts from all but the brightest point sources. 
Small relative inaccuracies can become extremely statistically significant and bias point 
source fits. 

We first validate the spatial distribution, Eq. 14.371 as follows. For each data pixel in 
the ROI, we evaluate Eq. 14.371 as determined from our convolution scheme (including 
integration over energy) at the pixel center. Since the pixels are small compared to the 
PSF, this is an accurate estimate of the pixel rate. We then examine the weighted residuals, 
{Nobs — ]^mod) / V Nmodi as a function of position. Representative results for two energy 
bands are presented in Figures [5.1.31 and 15.1.31 and are quite featureless, i.e., the model is 
correct in both shape and magnitude. 

At energies above 1 GeV, the pixelization becomes too small for this scheme to work. 
However, above 1 GeV, the convolution is less important as the PSF scale is small compared 
with the model scale (0.5° for glLiem_v02). For the remainder of the validation, we simply 
present the integral of Eq 14.371 over the aperture in Figure 15.1.31 The model predictions 
agree with the Monte Carlo data to about 1% at all energies, although the poor statistics 
(from the relatively soft spectrum of the Galactic diffuse) at high energy preclude validation 



^The trend to overpredict the counts from 0.1 to 1 GeV is under investigation. 
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Figure 5.4: Comparison of modeled and observed counts for a point source with a F = 2.0 
spectrum. The histogram gives the model expectation and the data points the observed 
counts. The residuals indicate that the accuracy is better than 1% at all energies. The 
minor but definite trend for an excess in the expected counts below 1 GeV can lead to a 
very small bias in the estimation of the photon index — see Table 15.21 It is in this energy 
range the effects of PSF approximation and "ragged" edge effects are most pronounced. 
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Figure 5.5: Comparison of modeled and observed counts for a point source with a T = 1.5 
spectrum. We include this model for its superior statistics at high energy, where it is clear 
the accuracy remains better than 1%. 
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Figure 5.6: The lefthand panels show the weighted residuals, {Nobs — Nmod) / V ^mod, for 
the diffuse background {glLiem_v02 + isotropic-iem_v02) at low energy in zenithal equal- 
area (ZEA) projection. The top (bottom) row shows front-converting (back-converting) 
events. The "ragged edge" of the HEALPix data compared to the ROI boundary (blue 
circle) is clear at these low energies, particularly for the back events. The righthand panels 
show a histogram of the weighted residuals. There are many counts per pixel for these 
low energies (mean 102 (360) for front (back)), and the residuals should (and do) follow a 
normal distribution (shown in red). The means of the distribution are consistent with 0, 
indicating no statistically significant bias. 
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Figure 5.7: Weighted diffuse residuals at moderate energies. The spatial residuals are again 
featureless. The count rate per pixel is at the limit of normal approximation (8 (24) for 
front (back)), and the residuals for the front-converting events clearly show a long right 
tail in keeping with the asymmetric Poisson distribution. (The mean is accordingly slightly 
biased.) The "ragged" edge is essentially eliminated by 1 GeV for nearly any reasonable 
ROI size. 
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Figure 5.8: The predicted and observed counts for the diffuse background {glLiem_v02 + 
isotropic-iem-v02) for ah energies and conversion types. 



beyond about 5%. 

5.1.4 Checking the Maximum Likelihood Fit 

We test the absolute level of accuracy of the maximum likelihood estimates of the spectral 
parameters by performing maximum likelihood fits to the summed data sets, i.e., by mea- 
suring the spectrum of a single point source with T = 2Ti. Such an exercise indicates the 
magnitude of any bias for a point source in the limit of very small statistical errors. Table 
15.21 shows the results for a point source alone, indicating better than percent-level accuracy. 
Table [5^31 shows results for the same sources in the presence of the diffuse background. The 
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Table 5.2: The accuracy of maximum-likelihood estimated parameters for three point 
sources with a flux of 2 x 10~^ph cm~^ s~^ and three different power law photon indices. 
The agreement is much better than 1% for all three sources, and there is additionally good 
agreement between the estimates for front-converting events and back-converting events. 
The general trend is for the source-as-fit to have a harder spectrum than simulated; this is 
in keeping with the slight excess of modeled events at low energy, as a hardened spectrum 
ameliorates this excess. 



additional degrees of freedom lead to a modestly increased bias in the flux measurement, 
but the accuracy is still about 1%; the accuracy of the photon index remains much better 
than 1%. 

5.1.5 Checking the Maximum Likelihood Fit: With Energy Dispersion 

Previously, we have ignored the effects of energy dispersion by using data binned using the 
simulated energy for each photon. To illustrate and gauge the effect of energy dispersion, 
we repeat two of the above exercises with data binned using the reconstructed energy. In 
Figures 15.1.51 and 15.1.51 the residuals — in which the model neglects energy dispersion but 
the data contains it — for two point sources with power law spectra appear. It is clear 
that energy dispersion is not entirely negligible! The residuals can be of order 10%, and the 



dispersion introduces slight curvature in the spectru 



However, it is clear that much of the 



curvature can be eliminated by restricting analysis to photons with reconstructed energies 

^The dominant mechanism is as follows: the effective area increases rapidly from 0.1 to about 0.5 GeV, 
while the energy dispersion is approximately symmetric in logarithmic energy, and there is a net "migra- 
tion" of photons from high to low energy, causing the model to underpredict at low energy and overpredict 
at high energy. 
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Table 5.3: As Figure 15.21 but with the Galactic and isotropic diffuse backgrounds added. 
The agreement remains at the 1% level. 
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Table 5.4: As Table [521 but now including the effects of energy dispersion in the data. No 
diffuse background is included. 



above 200 MeV. Then — for power law sources, at least — the effect of energy dispersion is 
primarily a renormalization of the spectrum, decreasing the measured flux below its true 
value by a few percent. 

While the residuals look somewhat dire, the parameter values for bright sources are 
not significantly affected. Tables 15.41 and 15.51 show the absolute deviation of the measured 
parameters from the Monte Carlo truth without and with, respectively, the presence of a 
diffuse background. The bias from neglect of energy dispersion is < 5%. 

5.2 Spectral Analysis of Ensembles: VeriGcation of Central Values and Error 
Estimates 

The most important capability of pointlike is reliable and rapid extraction of spectral param- 
eters and estimates for their statistical errors. To test this, we perform a spectral analysis 
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Figure 5.9: As Figure 15.1.21 but for a power law source with photon index of 1.5 and 
allowing for energy dispersion in the data. The model value is fixed at the Monte Carlo 
truth, so the residuals indicate how the photons have been redistributed from the true values. 
The intrinsic spectrum of hard sources enhances the tendency for high energy photons to 
redistribute to low energies. 
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Table 5.5: As Table [5^ but now including a diffuse background. 
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Figure 5.10: As Figure [5.1.51 for a power law source with photon index of 3.0. Soft sources 
show a decreased curvature at low energies, but the overall loss of photons from the passband 
is increased. 
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of 20 Monte Carlo realizations of the diffuse background and a point source at a given flux. 
We plot the results for a series of decreasing point source fluxes, F^, and S-Fg. The 

latter flux is below the detection threshold of the LAT for modest (1-2 years) integration 
baselines for sources near the Galactic plane, while the penultimate (J-g) is marginal. J-7 
is the flux of a typical Galactic source. We show the results both in absolute units and in 
units in which the parameters become approximately bivariately normal. 

5.2.1 Spectral Parameters 

We consider first Fig. 15.2.11 which shows a scatter plot of the estimated integral flux and 
photon index for a power law source with F = 2.0 in error-weighted units: x'^ = [xi — fix)/o'i, 
i.e., we have subtracted from each estimated parameter the sample mean and then divided 
by the estimated error for the parameter. If the error estimates are accurate (implying the 
likelihood surface is approximately Gaussian), then the error- weighted estimates should be 
normally distributed. Thus, we draw the confidence contours of a two-dimensional Gaussian 
(using the sample covariance matrix) on each figure to determine if the appropriate sample 
fraction lies within the appropriate contour. In general, this is the case, indicating (a) the 
Gaussian approximation of the likelihood surface is a good one and (b) the likelihood surface 
is being accurately measured by the error estimation algorithms. 

The sample mean has an error that scales as the square root of the sample size, say 
With sufficiently accurate estimates of the parameter errors (bright sources) and/or 
sufficiently numerous Monte Carlo realizations, we can begin to resolve systematic bias in 
the parameter estimates. We adopted the former approach in the previous section and 
explore the latter approach here. The magnitude of the bias, in units of the statistical 
error of a single source, can be read off of the figure by the position of the black cross with 
error bars. For the bright sources (J-" = Fq), systematic bias is detectable but is at most 
comparable to the statistical error. For dimmer sources, the statistical errors dominate. 

To gauge the absolute magnitude of both the statistical and systematic errors, we display 
the absolute parameter values in Fig. 15.2.11 (Note we have still scaled the flux by the 
simulated value.) Here, the position of the parameters relative to the contours are not 
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meaningful and are only shown for reference. In the case of F = 2.0 and = J^q, we see 
the absolute error on the flux is < 2%. 

5.2.2 Position Parameters 

To check the accuracy of the MLE for the source position and position uncertainty, we first 
performed a ML fit for the spectral parameters and followed with a ML fit for the position. 
We did not iterate further. The deviations in relative units and error- weighted units for 
Right Ascension and Declination are shown in Figures 15.2.21 and 15.2.21 indicating that the 
mean of the position estimate is consistent with the simulated value and the the position 
uncertainty estimates are not too small, indicating only a small correlation between the 
position parameters and the spectral parameters. Some of the low-flux < Ts) sources 
have a signal-to-noise ratio too low for robust localization. 

By symmetry and the apparent insensitivity of the best-fit position to the precise value 
of the spectrum, we conclude that position estimates will be unaffected by energy dispersion. 

5.3 Summary 

By combining 20 years of simulated data for bright sources, we demonstrated that machinery 
used by pointlike to calculate expected rates for point sources and for diffuse sources is 
generally accurate to within a few percent of the expectation. By performing spectral (and 
position) ML fits on ensembles of sources, we assessed the accuracy of the central values and 
uncertainty estimates reported by pointlike over 2.5 decades of source brightness, finding 
generally good agreement with systematic bias < 10%. 
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Figure 5.11: Results for T = 2.0 in error-weighted units. The contours give the 68%, 95%, 
and 99% confidence intervals as estimated from a Gaussian distribution using the sample 
covariance matrix. Blue circles (red squares) are the 68% (32%) of the sample closest to 
(farthest from) the mean. The single, black errorbar at the center of the contours gives 
the sample mean, and the length of the error bars gives the 1-d sample standard deviations 
divided by \/iV. The black axis marks crossing at the origin represent the simulated ( "true" ) 
values. Thus, a failure of the cross to overlap the origin indicates a systematic bias in the 
parameter estimate. The position of the cross relative to the origin gives the magnitude of 
the bias in "sigma" units. Sources with a test statistic < 10 are indicated with a cross-filled 
symbol. 
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Photon Index = 2.0 
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Figure 5.12: As Figure 15.2.11 in relative units. The thick dashed hne indicates zero flux. 
Absolute outlying values are more apparent at low fluxes. 
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Figure 5.13: The angular deviation in absolute units (°) for the position in the R.A. and 
Decl. directions. The number of sources with successful position fits is indicated by N. 
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Figure 5.14: As Figure [5.2.21 but with error- weighted units. 
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Chapter 6 

APPLICATIONS OF POINTLIKE 

Having detailed the inner workings of pointlike and vahdated the tool on a variety of 
test cases, we now outline some of the science results pointlike is capable of delivering. We 
start with "single-source" analysis — by which we mean analysis of a relatively small region 
of the sky — to demonstrate the flexibility of pointlike. To showcase the speed of pointlike, 
we then develop an all-sky analysis in which we determine spectra for every known GeV 
source. Finally, we make use of this "optimal" model for other analysis tasks. 

6.1 Single-source Analysis 

To illustrate the use of pointlike in its "single-source" mode, we provide an example analysis 
of sources in the Cygnus region of the sky. This region, rich in sources, has already delivered 
multiple discoveries, including the LAT's first detection of a new radio-loud pulsar [8], a 
particularly bright radio-quiet pulsar in the 7-Cygni supernova remnant [3], the discovery of 
or bitally- modulated emission from Cygnus X-3|48j. and one of the first pulsars discovered 
in radio after detection by Fermi-hAT [^SSj- This is clearly a region that rewards accurate 
analysis! 

6.1.1 Data Prepaparation 

The FTl and FT2 files are obtained from a database maintained at SLACo. Similar data 
products are available from the FSSCg. For this exercise, we select data obtained between 
4 August 2008 and 8 August 2010, processed using "Pass 6" reconstruction algorithms. We 
restrict data to events with a reconstructed position lying with 15° of PSR J2021-I-3651 and 



^http://glast-ground. slac.stanford.edu/DataPortalAstroServer/, SLAC credentials required 
•^http: / /fermi. gsfc.nasa.gov/ssc/data/access/lat/ 
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with reconstructed energies between 100 MeVtj and 100 Gev. Finally, we use "diffuse" class 
events, a subset of events with a low background contamination. 

Before ingestion into pointlike, we apply the Science Tool gtmktime^ to the data. The 
tool applies a general filter based on the S/C orientation and position. We apply a filter 
that modifies the GTI in the FTl file to exclude times when the S/C exceeds a 52° rocking 
angle and when the 15° ROI defined in the preceding paragraph approaches the horizon. 
These excisions are designed to decrease contamination from Earth's limb and, since they 
use the known S/C telemetry are essentially bias free. 

Additional preparation, binning, and generation of secondary data products are per- 
formed within pointlike. These procedures are discussed in Chapter HI and we only provide 
some of the specifics here: data are binned with eight-bin-per-decade resolution, the live- 
time as a function of position is tabulated with 1° resolution, and we remove events with 
reconstructed zenith angles > 105° and reconstructed incidence angles > 66.4°. 

6.1.2 Source Model 

After preparing the data, we must construct a spectral model for the sources in the region. 
Models for both the diffuse background and point sources are in a state of continual improve- 
ment as more data is acquired. E.g., additional photons allow dimmer point sources to be 
detected, which in turn allow a better diffuse model to be generated, in turn allowing more 
point source detections, and so forth. Systematic refinements independent of integration 
time also allow for model improvement. 

pointlike is capable accepting a wide variety of input spectral models. For instance, it can 
parse all published Fermi-LAT point source catalogs as well as many source lists internal to 
the LAT Collaboration. It can ingest mapcubes representing a variety of diffuse emission. 
However, rather than using the most recent internal models, we choose to make contact with 
the published literature and adopt models used in and resulting from the 1FGL[12] catalog 
analysis. For diffuse models, we use the glLiem_v02 mapcube for the Galactic diffuse and 



■^To ameliorate effects of neglecting energy dispersion, we restrict analysis to energies above 200 MeV. 
^http:/ /fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt 
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the isotropic-iem-v02 intensity tabulatiorj^ for the isotropic background. For point sources, 
we use the spectral models presented in the IFGL catalog. 

In general, there is a tradeoff between a statistically-optimal fit — a joint fit in which 
the parameters for all sources are allowed to vary — and the computational intractability of 
minimizing a high-dimensioned function with a large data set. The typical approach is to 
select an ROI centered on the source of interest and allow sources within a fixed angular 
separation — say 5° or 10° — to vary along with the source of interest. Sources at larger radii 
remain fixed. Due to the large tails of the PSF at low energy, sources lying outside the ROI 
must be included in the source model, and it is clearly impossible to fit these sources. To 
estimate parameters for these sources, similar analyses must be performed for other regions 
of interest in what is essentially a bootstrap procedure. We shall have more to say on this 
in devising an iterative, all-sky fit below. 

In the following analysis, we model the 53 IFGL sources within 20° of the ROI center, 
and we maximize the likelihood with respect to the 17 IFGL sources within 8° of the ROI 
center as well as three parameters for the diffuse background, a power law scaling for the 
Galactic diffuse and a normalization for the isotropic diffuse. 

6.1.3 Broadband Spectroscopy 

The first step in any analysis is to maximize the likelihood over the free sources. This is done 
using the gradient fitter as described in Chapter [H During this process, spectral parameters 
and estimates are determined for all 17 free sources. The process takes 1 — 2 minutes. As an 
example of the output, and to begin an example on the importance of correct modeling, we 
report the broadband spectral parameters for two sources, PSR J2021+3651[8] and IFGL 
J2015. 7+3708, thought to be a blazartMl- 



~ IFGL J2021. 0+3651 fitted with PowerLaw, e0=798 



Norm 
Index 
Ph. Flux 



(1 + 0.013 - 0.012) (avg = 0.013) 1.34e-010 

(1 + 0.005 - 0.005) (avg = 0.005) 2.34 

(1 + 0.021 - 0.021) (avg = 0.021) 1.29e-006 (DERIVED) 



^http: / / fermi.gsfc.nasa.gov / ssc / data/access /lat /BackgroundModels.html 
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En. Flux : (1 + 0.012 - 0.012) (avg = 0.012) 7.62e-010 (DERIVED) 



IFGL J2015. 7+3708 fitted with PowerLaw, e0=1051 



Norm 
Index 
Ph. Flux 
En . Flux 



(1 + 0.043 - 0.041) (avg = 0.042) 1.47e-011 

(1 + 0.014 - 0.014) (avg = 0.014) 2.46 

(1 + 0.073 - 0.068) (avg = 0.070) 3.27e-007 (DERIVED) 

(1 + 0.043 - 0.041) (avg = 0.042) 1.62e-010 (DERIVED) 



In this output, "eO" indicates the pivot energy, an estimate for the energy at which 
the flux density ("norm") and photon index ("index") are least correlated. The integral 
photon flux (> 100MeV,cm^^ s^^) and integral energy flux (erg cm~^ s~^) round out the 
parameters. The relative error is reported, and since the parameters are transformed to log 
space internally, they are naturally two-sided when transformed back, but for bright sources 
become approximately symmetric. 

The IFGL models only provide power law parameters for the point sources. Yet the 
two brightest sources in the region, PSRs J2021+4026[3] and J2021+3651, are known to 
have exponentially-suppressed spectra. Further, IFGL J2015. 7+3708 is only 1.1° from PSR 
J202 1+3651. To correct this flaw in the model while simultaneously demonstrating the 
flexibility of pointlike, we modify the sources interactively. We first add a cutoff energy to 
J2021+4026 and re-maximize the likelihood, increasing its logarithm by 480. We repeat the 
process for J2021+3651, improving the log likelihood by 296. We now have: 







IFGL J2021. 0+3651 fitted with ExpCutoff 



Norm 
Index 
Cutoff 
Ph. Flux 
En . Flux 



(1 + 0.026 - 0.026) (avg = 0.026) 2.01e-010 

(1 + 0.020 - 0.020) (avg = 0.020) 1.74 

(1 + 0.065 - 0.061) (avg = 0.063) 2.93e+003 

(1 + 0.034 - 0.033) (avg = 0.033) 8.18e-007 (DERIVED) 

(1 + 0.018 - 0.017) (avg = 0.018) 5.44e-010 (DERIVED) 



1 ~ IFGL J2015. 7+3708 fitted with PowerLaw, e0=1051 
Norm : (1 + 0.040 - 0.038) (avg = 0.039) 1.6e-011 
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Method R.A. Decl. r95 (°) 5/r95 

Timing 305.2728 36.858 — — 

BB, PL 305.2533 36.850 0.010 1.5 

BB, PL+EC 305.2543 36.848 0.010 1.5 

BF 305.2542 36.848 0.010 1.5 

Table 6.1: A comparison of the DC position obtained for PSR J2021+3651 with three meth- 
ods to the (weh-constrained) timing position [8J. The "BB" entries corresponding to using 
a broadband spectral model (respectively a simple power law and a power law with expo- 
nential cutoff), while "BF" refers to the model- independent method discussed in Chapter 
m The three methods are self-consistent, but inconsistent with the radio position at high 
confidence. 



Index : (1 + 0.015 - 0.014) (avg = 0.014) 2.62 

Ph. Flux : (1 + 0.071 - 0.067) (avg = 0.069) 4.67e-007 (DERIVED) 

En. Flux : (1 + 0.044 - 0.042) (avg = 0.043) 1.94e-010 (DERIVED) 

We note the reported spectrum for IFGL J2015. 7-1-3708 has softened significantly, F = 
2.46 — )• F = 2.62, well outside of the statistical errors of ±0.04, while the integral flux 
has increased by > 40%. We take up again the (cautionary) tale of this source when we 
discuss model-independent spectroscopy, but first we give a brief example of localization 
with pointlike. 

6.1.4 Source Localization 

To demonstrate the localization capabilities of pointlike, we again focus on the region near 
PSR J2021+3651. Since the position of J2021-I-3651 is well-known from radio timing, its 
brightness in 7-rays allows a validation to high statistical precision of the ML best-fit po- 
sition. The results of localizing J2021-I-3651 (having first made the spectral modifications 
described in ^6.1.3| viz. adding cutoff energies to the two bright pulsars) appear in Table 

EH 
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Method R.A. Decl. r95 (°) 5/r95 

Timing 305.2728 36.858 — — 

BB, PL+EC 305.2685 36.859 0.010 0.83 

Table 6.2: PSR J2021+3651, comparison of the DC position to the (well-constrained) timing 
position. This fit includes a new, highly significant source, and the Fermi-LAT position now 
agrees with the timing position to a precision of < 1'. 

In this iteration, it is clear that either the position is off or the uncertainty estimate is 
too small, as the best-fit position lies well outside of r95, the radius inside which we expect 
the ML position to lie in 95% of realizations of the experiment. In the time since the IFGL 
catalog was collated, we have detected a new point source at (R.A., Decl.) ~ (304.46, 36.5)1^. 
(We use this source as an example of the source detection algorithm in §6.41 ) This source 
is < 0.8° from PSR J2021-I-3651, and we expect that failing to account for its emission may 
bias — albeit slightly — the position of the pulsar. We therefore add a new point source to 
the model — a task which can be done interactively — at a trial position estimated from the 
TS map appearing in Figure 16.4.21 We adopt a simple power law with Ansatz parameters 
(r = 2.0 and an integral flux of lO^^ph cm~^ s^^ ) for the spectum, and we (a) re-maximize 
the likelihood, (b) re-localize the new source, and (c) re-maximize the likelihood. The new 
source improves the log likelihood by 361, and after incorporating its emission in the spectral 
model, localization of J2021+3651 yields the results in Table [6^21 With the new model, the 
LAT position is consistent with the radio position, and with a precision of better than one 
arcminute! As in the previous section, we see both the importance of correct modeling in 
likelihood techniques and the consequent benefit of interactive/exploratory analysis. 

6.1.5 Model-independent Spectroscopy 

We introduced a method of generating (nearly) model-independent spectra in ^4.4.6[ By 
model- independent, we mean that we do not assume a broadband functional form for the 

®In fact, it was this exercise — and subsequent failure to arrive at a good localization — that led to the 
detection! 
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source in question and we maximize a series of likelihoods to estimate the flux in each band. 
However, we do assume an Ansatz model in the initial fit in order to obtain parameters for 
the other sources in the ROI in a consistent fashion. 

We recall that the broadband spectral fit for IFGL J2015. 7+3708 was affected by the 
addition of a cutoff parameter to the nearby, bright PSR J2021+3651. We can also examine 
what impact it has on the model-independent spectrum. In Figure 16.1.51 we show a series 
of spectra for IFGL J2015. 7+3708 as we progressively refine the model for the nearby 
sources as described above. In the first panel, we see a strong suppression of the spectrum 
below ~ 500 MeV. This is easily understood as the power law model for PSR J2021+3651 
overpredicts the emission at low energy (the spectrum is concave down), and through the 
poor PSF at these energies the overprediction suppresses the flux estimates for neighboring 
sources. When we model the pulsars correctly, with a cutoff, the low-energy behavior is 
much improved, as seen in the second panel of the figure. Note that there is essentially no 
change above 1 GeV; the PSF is sufficiently good that the sources are resolved. 

Looking closely at this new fit, we see there is tension between the simple power law 
model and the model-independent spectra. Motivated by this evidence for spectral curva- 
ture, we can switch the model for IFGL J2015. 7+3708 to a log parabolic form: 



which reduces to a power law for (3 = 0. After re-maximizing the likelihod, we see the new 
fit is in much better agreement with the model-independent points. 

Finally, we note that the new source we introduced in the previous section is only ~ 0.8° 
from IFGL J2015. 7+3708 and, like J2021+3651 but to a much smaller extent, is likely to 
affect the low-energy spectrum of IFGL J2015. 7+3708. In the final panel of Figure lB.l.Sl we 
show the spectrum after this source has been included in the model. As expected, it "soaks" 
up some of the low-energy emission and the lowest-energy point of the IFGL J2015. 7+3708 
spectrum drops accordingly. 

For completeness, we also include the spectra of the two bright pulsars in Figure 16.1.51 




(6.1) 
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Figure 6.1: The model-independent spectrum for IFGL J2015. 7+3708. At upper left, the 
initial spectrum estimated with all sources fixed at IFGL values. At upper right, the 
spectrum after adding cutoff parameters to PSRs J2021+3651 and J2021+4026. At lower 
left, no external sources have changed but we model IFGL J2015. 7+3708 with a log parabola 
(see text). Finally, at lower right, we have added the new source discussed in the previous 
section. 
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Figure 6.2: Spectra for PSRs J2021+3651 and J2021+4026. 
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6.2 All-sky Analysis 

In discussion of single source (rather, single region) analysis, we mentioned a weakness of 
this approach: by selecting only a subset of the data, we cannot constrain all of the sources 
that contribute to the model for the data. That is, since photons from sources from up to 
10° outside the ROI can disperse into it, we must account for these sources in our model. 
However, we cannot actually fit these sources since only low-energy photons contribute to 
the data. We must have some a priori model. On the other hand, these same remarks 
apply to an ROI containing these external sources, and we have the makings of a "chicken 
and egg" scenario. 

Obviously, the larger an ROI becomes, the less important are these "edge effects", par- 
ticularly for a source in the center. One extreme solution then is to include all data in the 
fit and attempt to jointly maximize the likelihood for all sources in the sky. At this junc- 
ture, such an approach is computationally infeasible. A less extreme version of this solution 
would employ an ROI sufficiently large that these "edge effects" become less important. 
However, using too large of an ROI brings its own drawbacks: 

• the algorithms for convolution, likelihood evaluation, and other tasks have a complex- 
ity 0{Rj^Qj) as well as additional overhead for large ROIs; 

• the plate caree projection begins to show distortion for large ROI radius; 

• scaling parameters for the diffuse model become less effective for correcting local 
deficiencies. 

Another approach is an iterative "bootstrap" analysis in which a set of ROIs are fit 
sequentially with the hope of convergence after a few iterations. On the initial iteration(s), 
only the brighest sources — the most statistically independent and influentiaj^l — are fit, along 

^ Fermi-ljAT sources span about four orders of magnitude in integral photon flux, from a handful of 
sources like the Vela pulsar with fluxes of order fO~^ph cm~'^ s~^ to myriad high-Galactic-latitude sources 
on the detection threshold with fluxes of order fO~^ph cm~^ s~^ . Sources in the top decade or so of 
flux utterly dominate the dimmer sources from a photon count standpoint, and (a) can be fit sufficiently 
without accounting for dimmer sources (b) must be sufficiently fit in order to fit — or even detect — dimmer 
sources. 
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with parameters for the diffuse background. In later iterations, the flux threshold is relaxed. 
With each iteration, the "external" background for an ROI in principle becomes better 
modeled, and the fits for sources within an ROI improve accordingly. 

By combining these approaches — using an iterative, all-sky analysis (ASA) with the 
largest practical ROI radius — we can hope to obtain parameter estimates close to those 
that could be obtained from a true joint maximum likelihood fit. Such an all-sky model, 
beyond the intrinsic interest of characterizing the GeV sky — is an ingredient sine qua non 
for more complex analyses to be discussed in the sequel. Below, we outline a particular 
implementation using pointlike. 

6.2.1 ROI Prescription 

In such an all-sky analysis, there are three initial decisions to make: 

1. the geometry of the ROI (circular, square, other). The infrastructure in pointlike 
is designed around a circular aperture. This geometry maximizes symmetry (useful 
for calculating, e.g., PSF overlap integrals) and makes data extraction particularly 
simple. However, it is impossible to tessellate the sky with circles, i.e., our ROIs 
must overlap, and we must then adopt a prescription for dealing with the same point 
source appearing in several ROIs as well as address a similar problem with the degrees 
of freedom of the diffuse background. On the other hand, if we use a scheme like 
HEALPix to select data, we can tessellate the sky, achieving statistically independent 
ROIs. 

2. the size of the ROI. As discussed above, the size of the ROI determines the magnitude 
of "edge effects". 

3. the (possibly proper) subset of the ROI space within which point sources will be fit, 
i.e., the method for dealing with ROI overlaps. 



In light of these considerations, we adopt the following prescription: 
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Nside Rroi (°) Fraction ROI within HEALPix 



3 



15 



0.54 



4 



12 



0.48 



6 



10 



0.30 



Table 6.3: The fraction of the photons in a circular ROI with radius Rroi contained within 
a HEALPix (sharing the same center as the ROI) of size A-k / {12N^^^^) radians. 

1. Tessellate the sky using HEALPix pixels with A^^jije = 4. These pixels have an area of 
(14.7°)2. Although they are not regular in shape, in discussion we can approximate 
them as squares about 15° on a side. There are 192 such pixels. 

2. About the center of each pixel, extract data using a circular aperture with a radius 
of Rroi = 12°. The radius must satisfy the constraint that the pixel be entirely 
encircled, roughly that Rroi > SOy/G/ir/Nside ~ 41°/A'side. 

3. Perform a maximum likelihood analysis in which point sources within the boundary 
of the HEALPix are allowed to vary while all sources lying in other HEALPix are kept 
fixed. 

These choices of A'side and ROI radius represent a compromise between overall ROI size and 
the ratio of photons within the HEALPix pixel to the total number of photons, in this case 
0.48. (A ratio of one gives statistically independent ROI. Since each HEALPix has 7 or 8 
neighbors, to first order, a given pair of pixels have < 10% of their photons in common.) 

Other choices are presented in Table 16.31 Although an even larger (15°) ROI allows 
reduced redundancy and overlap, the background convolution begins to become problematic 
with an ROI of thise size. 

6.2.2 Iteration Prescription 

In order to allow the array of ROIs to "relax" to an approximately global ML solution, we 
proceed via iteration. In each iteration, we allow all diffuse source parameters to vary, while 
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Figure 6.3: The integral photon flux of IFGL sources above 300 MeV. The vertical lines 
indicate the flux thresholds in the iterative all-sky fit. 
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only a fraction of point sources above a certain flux threshold are allowed to vary. We find 
that the integral photon flux above 300 MeV is a good indicator of source significance and 

oo 

we define the quantity F300 = / dEdN/dE for applying flux cuts. This quantity for all 

300 

sources in the IFGL catalog is shown in Figure I6.2.2[ Concretely, the iteration strategy is: 

1. Perform two fits in which only point sources with F300 > 10^'^'^ are allowed to vary. In 
Figure 16.2.21 this cut includes all sources to the right of the rightmost red horizontal 
line, essentially the hundred brightest point sources. These sources are sufliciently 
strong to alter the diffuse background model parameters but are essentially indepen- 
dent of weaker sources. These two iterations provide a "baseline" model with which 
to fit additional point sources. 

2. Perform three iterations in which the flux threshold is lowered by one half of a decade 
each time. These cuts are also indicated in Figure [6.2.21 

3. Perform three iterations in which the flux threshold is set to 0, i.e., all point source 
parameters are allowed to vary. 

Additionally, at each iteration, after performing an initial ML fit for the ROI, we check 
the free point source parameters for unphysical values, e.g., a photon flux consistent with 
zero or a photon index lower than 0.5 or higher than 4.0. If we flnd such sources, we reset 
their parameters to a nominal value and reflt the ROI. 

Finally, we note that performing all 8 iterations for all ROIs requires on the order of 100 
CPU hours on modern machines, a small cost on a modest computing clusters. 

6.2.3 Tests for Convergence 

To test the convergence of the above iteration scheme, we track the change in the log 
likelihood for each ROI from iteration to iteration. As an example, we consider an ASA 
performed according to the standard iteration prescription using 18 months of data to 
which standard cutqfl were applied. The source model was taken to be that of the IFGL 

^Incidence angle < 66.4°, zenith angle < 105°. 
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analysis, i.e., for point sources the IFGL catalog and for diffuse sources glLiem_v02 and 
isotropic-iem_v02. The log likelihood changes for the eight iterations are shown in Figure 
16.2.31 and indicate the final iteration is well-converged. 

Particularly for ROIs with relatively large overlaps with neighbors (Table EiS] e.g.), total 
convergence is rather difficult to achieve. With each iteration, a pair of coupled ROIs 
will update their parameters to reffect the previous iterations background model, and this 
oscilllation can proceed for many iterations. The magnitude of the log likelihood shift is 
typically of order unity, and we regard these shifts as negligible. An approach that will in 
principle eliminate this issue uses the same HEALPix both for organizing free point sources 
and extracting data, i.e., it abandons circular ROIs. 

6.2.4 Results 

As an example, we show a comparison of the parameters obtained with the pointlike ASA 
and those obtained by gtlike as presented in the IFGL catalog[12j. We apply the iteration 
prescription to the same data set outlined in [12] but performing our own standard cuts (see 
previous section) and calculating the livetime with the pointlike package^ In some sense, 
this exercise is only a sanity check, but we emphasize these results are obtained with an 
entirely independent toolchain. We compare the central values obtained for the power law 
parameters for all 1451 IFGL sources in Figure [6.2.41 and find good agreement. A detailed 
analysis in Figure [6231 find good agreement between the error- weighted parameter estimate 
differences^ and the uncertainty estimates. 

6.2.5 Diffuse Source Studies 

As an additional validation of the treatment of diffuse sources presented in Chapter [J we 
take advantage of the ASA infrastructure to obtain fits to diffuse scaling models and verify 
they are consistent with the input. We used 18 months of data generated by gtobssim for 
the glLiem_v02 Galactic diffuse and isotropic-iem_v02 isotropic diffuse sources and ran a 

^The livetime as calculated by the Science Tool application gtltcube differs slightly from the pointlike 
application, as the livetime calculation in pointlike corrects for the cuts on zenith and incidence angle. 

^"The curious offset of the pointlike fluxes of about 1/2 an error unit is under investigation. 
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Figure 6.4: The log likelihood changes in each ROI for each iteration. The iterations 
proceed from left to right and top to bottom. The precise quantity shown is sign(Alog£) x 
\J (Alog£, and the scale extends from —5 (dark blue) to 5 (dark red). For these large ROIs, 
a change in log likelihood of order a few is negligible. In the upper lefthand corner — the 
initial iteration — the log likelihood shift is dominated by the initial fit of the parameters 
for the diffuse models. The third through sixth panels show the log likelihood improving 
as more sources are allowed to vary. The brightest sources are concentrated in the Galactic 
plane (selection and projection effects), so only by the fifth panel are a significant fraction 
of sources at all latitudes free. The final three panels show the refinement with all point 
sources varying. In the final panel, essentially all ROIs have ceased to improve. Four pixels 
at the Galactic center continue to shift by order a few in an "oscillation" described in the 
main text. 
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Figure 6.5: A comparison of the ML estimates obtained by the pointlike ASA and the 
unbinned HkeUhood pipeline analysis of the IFGL catalog. All sources are fit with a power 
law. The flux density is evaluated at the "pivot energy" , an estimate of the energy at which 
the covariance of the photon index and flux density is minimized. Sources in red correspond 
to IFGL sources with a "c" appended, indicating they lie along the Galactic ridge and their 
spectral values should be taken with caution. Many of these sources may be spurious. A 
subset of these sources clearly departs from the main population for which pointlike and the 
IFGL values are in excellent agreement. 
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Figure 6.6: In the lefthand panel, the difference in the ML parameter estimates scaled by 
the parameter error estimated by the IFGL analysis. The photon indices are in excellent 
agreement, while the flux densities shows a small trend toward underestimating the flux. 
In the righthand panel, we compare the uncertainty estimated by each analysis. This 
comparison is made by estimating the relative error for each method and then taking the 
difference (IFGL - pointlike ASA). The error estimated by pointlike is generally in excellent 
agreement, with some flux error estimates exceeding the IFGL estimate by up to 20%, 
an unimportant difference. It is in any case not surprising that an unbinned method may 
deliver slightly smaller error estimates. 



107 



single iteration of the ASA procedure. The glLiem-v02 was scaled by a power law with 
free normahzation and photon index, while the isotropic-iem_v02 source was scaled only 
by a constant. During a ML fit, these parameters vary due to statistical fluctuations, but 
the distribution means should be consistent with the input model. The distribution of the 
best-fit parameters for each ROI is shown in Figure 16. 2. 5^ from which it is clear that the fit 
parameters indeed agree with the input model. 

6.3 Sky Maps 

In this section, we deviate from likelihood analysis (but not from all-sky analysis!) to discuss 
depiction of Fermi-hAT data. Constructing maps that give a good visual representation of 
the sky is challenging because (a) low energy events are significantly smeared by the PSF and 
(b) the emission is highly anisotropic, peaking strongly in the Galactic plane. Constructing 
a simple counts map — a histogram — with a fixed pixel size is not optimal since we will either 
lose resolution in high-count regions or suffer from "shot noise" in low-count regions. 

A well-known alternative to a histogram for visual representation of random data is 
kernel density estimation. In one dimension, e.g., the a kernel density estimator / for the 
probability density function / given a set of data {xi}, is 



where g is the kernel, some normalized density function, typically a gaussian, and b is the 
bandwidth, essentially a smoothing parameter. 

An obvious extension to the current use case is adopting the PSF for the kernel, with a 
bandwidth then naturally parameterized by the reconstructed energy and event class. As 
part of the all-sky analysis, we have already constructed for each ROI, in each energy band, 
an approximate, exposure-weighted PSF, as well as binned the data in a sparse format that 
eases the computation. We can thus calculate a kernel density estimate for the photon 
density on the sky as 





i=l 




(6.3) 



i=l 



j=l k=l 
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Figure 6.7: The best-fit values of the parameters for the two scahng models discussed in 
the text. The null value for the constants is 1, while for the photon index of the power 
law scaling the Galactic diffuse, 0. The abundant photons in the Galactic plane strongly 
constrain the Galactic diffuse model, while at high latitudes the sparse photons allow large 
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Figure 6.8: An image (in Hammer- Aitoff projection of Galactic coordinates) of the GeV sky 
produced by kernel density estimation with the PSF with 18 months of data. The units are 
somewhat arbitrary but can be roughly interpreted as counts per steradian. All energies 
> 100 MeV are included. 

where we symbolically indicate the PSF dependence on energy and conversion type through 
the band subscript j and where gives the observed counts in the kth pixel. In practice, 
we would evaluate /(f^) over some grid corresponding to an image projection. 

We show such an image for the entire 7-ray sky in Figure 16. 3i An image restricted to the 
Cygnus region of the sky, and to relatively high-energy photons, is displayed in Figure [Ol 
It is useful to compare this image, particularly the evidence for the new source mentioned 
in the discussion of the spectral analysis of the Cygns region, with the series of TS maps in 
Figure [6X2l 

6.4 Source Finding with TS Maps 

An important application of likelihood is searching for new sources. By examining how the 
likelihood improves by expanding the model to include new sources, we can determine (in a 
statistically calibrated fashion, as we show below) whether the sources are required by the 
data. 
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Figure 6.9: An image (in zenithal equal-area projection of Galactic coordinates) of the GeV 
sky produced by kernel density estimation with the PSF with 18 months of data. The image 
shows the Cygnus region and has overlaid the locations of sources in the IFGL catalog. 
In constructing this image, photon energies were restricted to 2200 < E/MeV < 10000 
(4400 < E/MeV < 20000) for events converting in the front (back) of the detector. Due to 
the ~ 1/S dependence of multiple scattering and the factor of ~ 2 difference in the radiation 
length of the front and back radiations, the two samples of events have approximately the 
same angular resolution. 
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6.4-1 A Test Statistic for Source Detection 

The typical technique is a hkehhood ratio test in which the ratio of the Hkehhood under 
the null hypothesis (no new source) is compared to the likelihood under the alternative 
hypothesis (there is a new source with some position/spectrum). The likelihood ratio is a 
test statistic (TS) and we reject the null hypothesis (claim detection of a new source) if the 
TS exceeds some threshold. 

To ensure a small probability of Type I error (false positive), it is useful to know the 
distribution of the TS in the event the null hypothesis is true. Then we immediately know 
our probability of Type I error is the tail probability of the distribution integrated from the 
threshold. (If we test multiple, independent data sets, or check the same set for multiple 
sources, we must of course multiply this probability by the number of total trials.) A 
celebrated result from WilksfH^ gives the asymptotic (i.e., large sample) distribution for a 
certain class of likelihood ratios. If the models for the null and alternative hypotheses are 
nested and have, respectively, hq and n parameters, and if the null values do not fall on the 
boundary of the parameter space, and if the likelihood satisfies certain regularity conditions, 
then the likelihood ratio is distributed as Xn-no^ chi-squared with degrees of freedom 
equal to the difference in the dimensions of the parameter spaces. This result is extremely 
useful for testing the statistical significance of additional components in models provided 
the new component is not on the boundary of the parameter space in the null hypothesis. 

Unfortunately, the likelihood ratio test for source detection does not satisfy the criteria 
for Wilks' Theorem. For example, suppose the likelihood for a data set in the presence 
of an additional point source at a fixed position with a fixed spectral shape but free flux 
parameter is calculated. The models (with and without the new point source) are clearly 
nested, but the new parameter — the flux — is in the null hypothesis. Since flux is positive, 
this value lies on the boundary of the parameter space. 

Fortunately, Chernoff[35] was able to extend the results of Wilks to show that, in cases 
where the alternative (null) model has a n (n — 1) dimensional parameter space, and the 
additional parameter is resticted to one side of an (n — 1) dimensional plane — as is the case 
for the scenario outlined above — then the null distribution is an equal mixture of (5(0) and 
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Xi- It is easy to show then that the chance probabihty of observing a value in excess of TS 
is given by the tail integral of a normal distribution from \/TS to oc, and the confidence 
level can be quoted as ^^VTSa^^ . Analysis of EGRET data, applying the same test for new 
sources, observed this distribution in Monte Carlo simulations [66J. 

6.4-2 Generating a TS Map 

To generate a TS map, we begin with a single region, viz. one of the HEALPix/circular 
aperture pieces of the ASA. After the ASA, we have a consistent model that should — if we 
have represented every source correctly — account for every photon in the GeV sky. Next, 
we subdivide the HEALPix pixel into sub-pixels small enough that we can resolve new 
sources. For A'^gide = 4, we might choose to divide each side of the base HEALPix pixel 
into 150 segments, for 150^ sub-pixels approximatly 0.1° on a side. (It will be recalled that 
this procedure is the typical method for producing a HEALPix pixel of finer resolution; this 
example grid has A'side = 600.) 

We regard the background model as fixed and we assume a power law model with T = 2.0 
for the putative point source. (Other indices can be adopted.) For the pixel at position rij, 
the log likelihood for a given band (using the notation of Chapter |4]) in the presence of a 
new source with flux J^, is 



£(J7i, J-,) = -0,{ni) Ns{:F,) + n . log 



(6.4) 



where b represents the contribution of the background to each data pixel and / is the value of 
the PSF for a source centered at fij at each data pixel. The likelihood is single-dimensional 
(by virtue of the fixed photon index) and can be easily maximized to determine the best-fit 
J-g (denoted J^, for each fij. The TS is twice the quantity obtained by subtracting the 
null log likelihood {Tg = 0) from the log likelihood evaluated at As defined above, 
£(i1i,0) = 0, so 

TS{Qi) = 2 X C{ni,fs). (6.5) 

We determine this quantity for each sub-pixel in each base (A'side = 4) HEALPix pixel. 
Since the background model is assumed to be fixed, the base pixels can be processed in 
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parallel. The output is the TS evaluated on a very fine grid, producing a TS map for 
the entire sky. It is then a relatively simple matter to "convert" these TS values to new 
sources by selecting the highest, isolated TS excesses, incorporating them into the model, 
re-maximizing the likelihood via the ASA, crafting a new TS map, etc. 

As an example of the efficacy of TS maps, we present a case example of searching for a 
source in the Cygnus region in Figure [6.4.21 

6.5 Source Classification 

It has historically been the case that detection of 7-ray pulsations from neutron stars follows 
detection of radio pulsations [82t W\\ [8] or even X-ray pulsations [551 [26]. This is due to the 
extreme paucity of photons collected from HE sources; long integration times are required, 
and even though pulsars are relatively stable rotators, the phase space in frequency, fre- 
quency derivative, etc. that must be searched has made direct pulsation detection infeasible. 
Instead, radio observations of a particular pulsar are used to build a timing solution (see 
Chapter [7]) that can be used to map the time of arrival of a detected 7 ray to the rotational 
phase of the neutron star. With a stable timing solution, photons collected over weeks, 
months, or even years can be accumulated, building up profiles in phase that can be tested 
against a uniform distribution (Chapter [7]). 

With the improved angular resolution and effective area of Fermi-LAT , GeV telescopes 
have finally reached some parity. Time-differencing techniques |20) have led to the discovery 
of over 24 new pulsars [31 [75], wrapping up two longstanding mysteries |13l [T] on the way. This 
number is comparable to the number of new 7-ray pulsars detected to date by Fermi-LAT 
using timing solutions [T7]. 

Besides direct detection of pulsations, "7-ray initiated" detection of pulsars is opened 
by the sensitivity to unpulsed sources. The high yield so far of detected pulsars suggests 
many of the > 1000 sources detected by Fermi-LAT so far [12] must harbor neutron stars 
emitting pulsed 7 rays. There are many reasons why such sources may be difficult to detect 
with direct pulsations searches. Many young pulsars exhibit stochastic timing noise [19], 
limiting the maximum integration period over which a detection may be obtained. Even 
if the pulsar is stable, long integration times require fine-grained searches and the possible 
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Figure 6.10: A TS map constructed of the Cygnus region to demonstrate the source finding 
principle discussed in the text. In the top panel, only the best-fit diffuse model (Galactic 
+ isotropic) is used for a background model. All point sources then contribute to the 
calculated TS, and the map is dominated by the exceedingly bright pulsars J2021-I-4026 
and J2021-I-3651. The range spanned by the color scale is 25-40,000. In the second panel, 
the two bright pulsars are included in the background, and the color scale is now 25-2200, 
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addition of additional search parameters (e.g., a second-order term in the Taylor expansion 
of phase-versus-time, i.e., a second derivative of frequency.) Integration times comparable 
to a year also require a precise position to correct for the changing light travel time (the 
Romer delay) as the earth revolves about the sun. Unfortunately, dim sources (requiring 
long integrations) are also difficult to localize and the problem becomes rapidly intractable. 
Finally, millisecond pulsars are difficult to detect because of the extreme sensitivity to 
precise values of period and period derivative, and millisecond pulsars in binary systems are 
essentially undetectable by Fermi-LAT . 

One approach, then, is to search for radio pulsations from promising Fermi-LAT sources. 
The dimmest known pulsars can be detected in hours with modern radio telescopes, solving 
most of the problems outlined above. For this approach to fail (a) the source is not a 
pulsating neutron star (b) the position is not good (c) the pulsar radio beam is not aligned 
with the earth (d) the signal was temporarily suppressed by scintillation. If pulsations are 
detected, we are essentially guaranteed both a new radio pulsar and a new 7-ray pulsar, 
since the chance coincidence of radio pulsars and LAT sources is quite low. 

What constitutes a promising Fermi-LAT source? A good candidate should 

1. have a reliable position with an uncertainty comparable to or smaller than the main 
beam of the radiotelescope used to conduct the search. For the 100-meter Green 
Bank telescope, this constraint ranges from a few arcminutes at 2GHz frequencies to 
up to 0.3° (error radius) at 350MHz. Sources within the Galactic plane suffer some 
additional systematic errors (e.g. unmodeled neighboring point sources, mismodeled 
diffuse emission). 

2. be unassociated with an extragalactic source. By comparing the positions of Fermi- 
LAT sources with multiwavelength catalogs, probabilistic associations of 7-ray sources 
with counterparts can be madep!2| I16j . If a source has a good probability to be 
associated with another source class (particularly active galactic nuclei (AGN), the 
largest class of Fermi-LAT sources), then it is sensible to pursue better candidates. 

3. show no variability. 



116 



4. have a spectrum (a) like known pulsars and (b) unlike known AGN. Essentially all 
pulsars detected to date have spectra consistent with a power law + exponential 
cutoff and have a constant flux when averaged over many periods [T7]. In particular, 
the observed cutoff energy (photon index) is generally < 5 GeV (< 2). On the other 
hand, AGN have spectra broadly consistent with power laws. Flat-spectrum radio 
quasars, a subclass of AGN, almost universally have F > 2, and thus a candidate can 
be spectrally vetoed by observation of a falling spectral energy density from 0.1 — 1.0 
GeV. BL Lac objects, a second subclass with harder spectra, can generally be excluded 
by observation of significant emission above 10 GeV. 

Thus, we seek to analyze all known Fermi-LAT sources with the aim of (a) estimat- 
ing the best possible positions and uncertainties (b) excluding known variable and known 
associated sources (c) fitting multiple spectra to assess the statistical strength of the ex- 
ponential cutoff parameter (d) producing a spectral energy density plot to visually assess 
spectral features. The all-sky analysis discussed above can handily achieve (a), (c), and 
(d). We outline a specific analysis scheme along with radio surveys conducted to search the 
resulting candidates and present preliminary results. 

6.5.1 All-sky Analysis for Pulsar Candidate Identification 

We performed an all-sky analysis as described above (in a more preliminary version) using 
11 months of Fermi-LAT data and an internal (to the LAT Collaboration) version of the 
source list that became the IFGL catalog. (The IFGL catalog was based on the same 11- 
month dataset employed here.) To supplement the analysis, we made use of the association 
and variability studies performed for the catalog preparation. For each source, we fit a 
power law spectrum, then added a cutoff, allowing a likelihood ratio test of the significance 
of the cutoff. We generated a spectral energy density plot for each source to assess its shape 
qualitatively. An example of these seds — which include also the other pieces of information 
save position — appears in Figure 16.5.11 
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IFGL J2021.5+4026 var: 3.6 gtsrcid: 1.00 dlogl: 437.88 IFGL J0023.5+0930 var: 8.0 gtsrcid: 0.00 dlogl: 3.54 
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IFGL J2253.9+1608 var: 2354.0 gtsrcid: 1.00 dlogl: 173.93 IFGL J0345.2-2355 var: 18.1 gtsrcid: 0.00 dlogl: 2.85 
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Figure 6.11: A gallery of typical plots used for classifying unidentified sources as good pulsar 
candidates. In the upper left panel, a known LAT PSR J2021.5+4026[3] gives an example of 
a typical pulsar spectrum: flat or rising in the 0.1 — 1 GeV decade, a strong cutoff, and a low 
"variability index". In the upper right panel, a much dimmer pulsar candidate (at which 
location radio pulsations were subsequently detected), which shows similar features scaled 
down by two orders of magnitude. At bottom left is the quasar 3C 454.3 [23] exemplifying 
the features of typical AGN emission: a falling spectrum in the 0.1 — 1 decade and extreme 
variability. At bottom right, a comparable (unidentified) source showing similar features 
scaled down by two orders of magnitude in flux. It has modest variability and would not 
be considered a good candidate due to this and its falling spectrum at low energy. 
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6.5.2 Observations 

Green Bank Telescope, 350 MHz 

A Fermi Cycle 2 Guest InvestigatoiF^ proposal (P.I. Mallory Roberts) to observe approxi- 
mately 50 unidentified Fermi sources at high Galactic latitudes with the 350 MHz receiver 
of the 100-m Green Bank Telescope (GBT) was accepted in 2009. Since the spectrum of 
pulsars in radio is typically a falling power law, observations at low frequency are more 
sensitive provided there are not significant disperserive effects from the interstellar medium 
(ISM). By restricting candidates to high Galactic latitudes, ISM effects are minimized. 

The ASA preparation and classification scheme was used to draw up a candidate list. 
An initial pass rejected any candidates at low (< —5° or > 5°) Galactic latitude and with 
Decl. < —40°, essentially the lowest latitude visible from GBT. Since the full-width at 
half-maximum (FWHM) of the 350 MHz beam at GBT is about O.ofl no cut was made 
on the positional uncertainty candidate sources as virtually all sources in the IFGL catalog 
have a 95% error radius < 0.3°. For sources satisfying the position consitraints, plots such 
as those presented in Figure [5.5.11 were generated for the remaining sources. By considering 
spectral shape, cutoff strength, association with known blazars, and variability (at time 
scales of w 1 month), sources were classified as 0, 1, 2, or 3, viz. unacceptable, poor, good, 
and excellent. 

Observations were carried out using the GUPPI[44] backend with the 350 MHz receiver. 
During the campaign, most sources classifed as "2" or "3" were visible and observed. In 
total, 47 sources were observed with a typical integration time of 32 minutes. The collected 
data requires considerable computation to fully search for pulsationq^ and have not yet 
been fully analyzed. However, the first 215 seconds of each observation have been fully 
searched, and five new MSPs have been detected. A brief summary of their properties is 

^ ^ http : / / fermi.gsfc . nasa.gov / ssc / proposals / cycle2 / 

^^To order of magnitude, the FWHM is given by the diffraction limit, FHWM « X/D « 0.5°, with 
D = 100m. The actual beam depends on the exact geometry of the receiver and the blazing of the dish 
surface. 

'^^E.g., many millisecond pulsars will be found in binary systems, requiring an "acceleration" search in 
which a linear correction for gravitational acceleration of a given magnitude is made. 
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Name NS Period (ms) Orbital Period (hr) Companion Mass (Mgun) 



J0023+09 3.05 3.3 0.016 

J0340+41 3.30 Isolated N/A 

J1302-32 3.77 >24 N/A 

J1810+17 1.66 4.9 0.089 

J2215+51 2.61 4.2 0.22 



Table 6.4: The properties of the 5 milhsecond pulsars detected to date in the GBT 350 
MHz survey. The binary J 1302-32 awaits further timing to constrain the orbital parame- 
ters. PSRs J0023+09 and J1810+17 are hkely of the "Black Widow" class[50j, while PSR 
J2215+51 presents eclipses. 



given in Table 



Parkes Observatory, 14-00 MHz 

A second serendipitoua^i survey was conducted with the 1.4 GHz receiver on the 64-m 
dish at the Parkes Observatory (P.I. Fernando Camilo). The source list was prepared as 
described in the previous section, mutatis mutandis, e.g. restricting the candidates to have 
Decl. < —40°. A total of 14 sources were observed with integration times ranging from 1 
to 2 hours. Five new millisecond pulsars, listed in Table [63} were detected. 

6.6 Summary 

In this chapter, we demonstrated the applicability of pointlike to both a restricted analysis 
(in the sense of a small region with a subset of sources) and to all-sky analysis (ASA). For the 
former, we carried out a spectral analysis in the Cygnus region, obtaining estimates of the 
parameters of broadband spectral parameters ( ^6.1.3p . source positions ( ^6.1.4p . and model- 
independent estimates of source spectra ( ^6.1.5p . An exploratory approach was crucial to the 

After a poster presented by P. Bangale at the 2010 meeting of the High-energy Astrophysics Division of 
the AAS. 

^^Observations were carried out during scheduled but unutilized maintenance time. 
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Name 



Classification 



JOlOO-64 



Bi 



J1514-49 



Bi 



J1658-53 



Iso 



J1747-40 



Iso 



J1902-51 



Bi 



Table 6.5: The five millisecond pulsars discovered in a serendipitous Parkes survey. 

success of this analysis as we observed the interplay of the bright source PSR J2021+3651 
with two dim, neighboring sources. 

Next, we outlined an approach for ASA in which we can obtain a good estimate for the 
spectral parameters (and positions) of all (known) sources in the sky. The method adopted — 
organizing sources by HEALPix while extracting data by cone — was a compromise between 
manageable data sets and optimal estimates for the parameters. We demonstrated that 
the iterative approach converged, while comparison with the IFGL catalog indicated good 
agreement between the two methods, i.e., a valid solution. 

Finally, we made use of the model estimated from the all-sky analysis (ASA) for ad- 
ditional applications: we constructed TS maps for source finding using the ASA spectral 
model as background; we generated a kernel density estimator map of the sky; and we used 
the ASA pipeline to perform a custom analysis of known sources to identify good pulsar 
candidates. In the next chapter, we shall make use of ASA-estimated spectra in building 
more sensitive tests for periodicity detection. 
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Chapter 7 

PROBABILITY- WEIGHTED STATISTICS AND PULSED 

SENSITIVITY 

We discussed in §6.51 the difficuhies involved in detecting pulsed signals with HE instru- 
ments. There, we focused on how the improved sensitivity of the LAT enabled detection 
of pulsation using 7 rays only, or through an initial detection of a point source with subse- 
quent pulsation detection in a targeted radio search. Here, we focus on the "other side of 
the coin" — since the LAT boasts much improved sensitivity, by using timing solutions, we 
should be able to detect periodic emission from many known sources. 

And this has turned out to be the case, e.g., the first detection of orbitally modulated 
HE 7 rays from LS I +61° 303 [7], the first detection of pulsed 7 rays from a millisecond 
pulsar [9j, and the first new young pulsar detected using a radio timing solution [8j. 

The pulsars detected so far have been sufficiently bright that relatively simple techniques 
have sufficed to discern pulsation. We discuss the details fully below, but in essence a set of 
photons is extracted from the data, the arrival times are converted to phase using the timing 
solution, and a statistical test for uniformity is performed. For the same reason that aperture 
photometry is inadequate for spectral analysis, so too is this procedure suboptimal for time- 
domain analysis: any given aperture that one selects will either be strongly contaminated 
by other sources or be selected so stringently that most of the signal is removed. In either 
case, the test results will depend strongly on the cut. 

For spectroscopy, we were able to tackle the problem of the multi-scale PSF and source 
confusion by using likelihood to account for the IRF. However, while we could model nearly 
any spectrum well with a simple model with only a few parameters, this is not the case for 
time-domain analysis. Gamma-ray light curves tend to be complex, and more importantly, 
since we are searching for periodicity, we do not know ab initio what the light curve should 
look like. Therefore, we do not pursue a full likelihood analysis here but shall instead adopt 
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a hybrid approach in which we combine the results of spectral analysis with simple pulsation 
tests. 

In further distinction to our approach to spectroscopy, we adopt an unbinned time- 
domain analysis. In this way we lose no information and — more importantly — incur no bias 
from an arbitrary choice of binning^. Since we will be using the results of a previous spectral 
analysis and don't have to worry about jointly fitting sources, we can use a relatively small 
(< 2°) ROI. This makes unbinned analysis practical from a computational standpoint. 

Specifically, we shall study a class of unbinned statistics composed of sums of sines and 
cosines of the observed phases. Next, we show how these statistics can be altered to weighted 
sums of sines and cosines. By using the results of a ML spectral analysis, we can calculate 
the probability that each photon originated from the source we are testing for pulsation, 
and this probability is an excellent choice of weight. We go on to show that the probability- 
weighted versions of the statistics drastically reduce the probability of Type I and Type II 
error through tests on ensembles of simulated pulsars. The machinery that we develop in 
this section can be easily applied to determine the pulsed sensitivity of the LAT. 

Before proceeding to periodicity searches, we discuss the necessary first step of converting 
the observables — photon times of arrival — into phase of the object of interest. 

7.1 Time-to-phase Mapping 

A preliminary step in any periodicity search is to map the event times to the appropriate 
phase, e.g., the rotational phase of a neutron star or the orbital phase of a binary. For 
instance, the reconstructed time associated with a Fermi-LAT event is recorded in Mission 
Elapsed Time, essentially TAI (International Atomic Time) with an origin of January 1, 
2001. In order to connect with sources at astronomical distances, the proper reference is 
time measured at the Solar System barycenter, which eliminates dependence on the (time- 
dependent) configuration of the Solar System. Using either an exact ephemeris for some 
astronomical source, or even simply a guess for the period, the barycentered times can be 

^Pulsar light curves can be quite sharply peaked, e.g. P2 of PSR J0030+045l[9], or relatively broad, e.g. 
PSR J1836+5925|13]. Choosing the "wrong" bin size can drasticaly decrease sensitivity to a class of light 
curves. 
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converted to the phase of the distant object. Thus, to convert a time in MET to phase, we 
must: 

1. correct for hght travel time, e.g., to the center of the earth, requiring knowledge of the 
S/C position at the given MET. Small general relativistic effects may also be included. 

2. correct for the earth's position as well as other Solar System effects (e.g., if the direc- 
tion of the source passes near the Sun at the given MET, it suffers a Shapiro delay). 
The unit of time at the Solar System barycenter is also slightly different from TAI. 
Light travel time and Doppler shifting from the earth's revolution about the sun are 
included. 

3. correct to the distant object rotational phase. If the object is a member of a binary 
system, the correction to the distant system barycenter is nontrivial, and must account 
for Doppler shift due to orbital acceleration, Shapiro delays, and light travel time 
delays. 

For Fermi-hAT data, there exists a Science Tool, gtbary, to convert MET to TDB (Barycen- 
tric Dynamical Time) or to geocentric time, i.e., time measured at the center of the earth. 
A plugin for tempo^ will also perform this transformation as well as fold the photons on 
standard- form ephemerides for pulsars. 

For the remainder of this section, the exact procedure we use will be irrelevant, and we 
will simply assume some black box has produced from a set of event times, {ti}, a set of 
phases, Our convention is to identify all values of phase that are congruent modulo 

1, so a sweep of (j) from to 1 defines a complete cycle. 

7.2 Statistics for Pulsation Searches 

Suppose that in the null hypothesis — emission from the source in question is not periodic — 
the {(pi} have the cumulative distribution function (cdf) That is, the probability to 

observe the random variable (rv) $ with a value less than or equal to (p is If the 

http; / / www.atnf.csiro.au / research /pulsar / tempo2 / 
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distribution admits a probability density function (pdf), d/d(j)F^{(j)), we denote this function 
/$((^). We follow this notation for distributions throughout. Often, the {(pi} in the null case 
will be uniformly distributed, ^ ~ [/[0, 1) — )• F^{(j)) = (p. However, the null distribution 
will not remain uniform if the instrument response varies on timescales comparable to the 
period of interest. Many of the results discussed below only hold under a uniform null 
distribution. Departure from uniformity is thus a serious complication, though we present 
two avenues of attack in Appendix iBl 

The alternative hypothesis specifies another distribution, say Ideally, this distri- 

bution is known a priori; in practice, we must estimate it, that is, provide a morphology for 
the light curve. We concentrate here on the class of tests — a subclass of those considered 
by Beran[25] — that represent the light curve using a Fourier expansion estimated from the 
data, i.e., the true (unknown) light curve is 

oo 

/((/,) = 1 + ^ Q,^ cos{2-Kk(t)) + pk sin(27r/c0), (7.1) 

k=l 

and we will attempt to estimate the coefficients. In practice, we will truncate this sum at 
some finite harmonic. (We eschew the complex notation for Fourier series here to avoid the 
complication of complex rvs. However, for brevity in the sequel, we will refer to the and 
the /3fc collectively as the V'fc-) We define 

Cfc((/)) = cos{2-Kk (I))] (7.2) 
Sk{(t>) = sin(27rA;(/)). (7.3) 

For brevity we will refer to these variables collectively as G^- Since $ is a rv, so are the 
Gfc, although the and are clearly mutually dependent. If <I> ~ [/[0, 1), the distribution 
function for these variables is simple and independent of k: fG^.{x) = [TrV—x"^]^^- We 
consider the case of nontrivial distributions for <I> in Appendix [Bl We note in passing the 

characteristic function for the distribution f{x) = [vr-v/l — 

1 1 
, , , . f dxexr)(itx) 2 f cos(tx) ^ , , ,„ 

0x(O =< expztx >= / J2_^' = - / ^=4=4 = Mt), (7.4) 

-1 

where we have appealed to symmetry in the integrand and the definition of the Bessel 
functions of the first kind. It is easier to calculate the moments directly than to differentiate 
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the above expression, and we have that the nth absolute moment is given by 

where (3(x,y) is the Euler beta funciton and T{x) is the usual Euler gamma function. 
Using the new variables, we define the empirical trigonometric moments, 

^ n \ ^ 

afc = -V'cfcj; /3k = -'y^Ski, (7.6) 
1=1 1=1 

where e.g. c^i = cos(27r (/>j). These quantities are essentially the Monte Carlo estimators for 
the Fourier coefficients for the kth harmonic. From the expression above for the absolute 
moments (or simply noting that Gk G [— 1 , 1] regardless of the distribution) , the mean (/i) 
and standard deviation (a) are finite and the central limit theorem (CLT) applies to the 
ipj.. That is, 

lim y/niipk - fiGk)/(^Gh ~ A/", (7.7) 

n— >oo 

where denotes the standard normal distribution. If $ ~ C/[0, 1) , /Ug^ = and ac^. = 1/ 
(Eq. ES]), then 

lim V2^ ipk ~ AA. (7.8) 

n— >oo 

The third moment of Gk is 4/37r, so the Berry-Esseen inequality gives the rate of con- 
vergence as oc 1/^/n. Specifically, let ipkin) denote one of the statistics above for a sample 
size of n. Then 

1^ / N ^/ M / ^2 0.847 

sup|F^,(„)(x)-$(x)| < (7.9) 

X o vr y n y n 

i.e., the difference between the cumulative distribution of the statistic and the cdf of the 
normal distribution, ^{x), is absolutely bounded. (Here, we have used C = 0.7056, the 
upper bound determined Shevtsova [76].) In practice, it is not this rate of convergence that 
will be the limiting step in later statistics we develop. However, before moving on, we men- 
tion that an exact expression for F^^(„)(j;) may be obtained by inverting the characteristic 
function noted earlier. Since -0*; is simply a linear combination of Gk, the characteristic 
function for ipk is given by 

</>>!.,(«)(*) = Jo (^) . (7.10) 
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Alternatively, if an approximate result, but more accurate than the CLT, is desired, one 
can use the known moments in an Edgeworth series. 

The quantities ^pk ai'e used in several statistical tests in the literature; see |42] for a 
brief review. In particular, invariance under rotations of phase become manifest when using 
these variables. For instance, if we redefine our zero of phase (p ^ cf) + 6, then 



(7.11) 



cos(27rA;(5) —sm(2Trk6) 
sm{2iTk5) cos{2iTk5) 

i.e., a phase shift amounts to a rotation in a//3 space for each harmonic, so any statistic 
involving only functions of the magnitude of these vectors, a| + /3|, will be independent of 
5. The simplest such statistic, a sum of for the first m harmonics, has a long history in 
high-energy gamma-ray observations, and we introduce it below. 

7.3 The Test 

The statistic, defined as 

m 

Zl = 2nY,c^l + Pl (7.12) 

k=l 

has been the workhorse of searches for 7-ray pulsars for years. A Zl test was used in a 
search for pulsations in COS-B data using timing solutions for 145 radio pulsars [32j. A 
similar search of EGRET data[69j used Z^ tests with 1, 2, and 10 harmonics, the H test 
(see below), and the "Z2_|_4" test which is defined as above but with summation restricted 
to the 2nd and 4th harmonics. It forms an integral part of the H test, and continues to see 
use in analysis of Fermi-LAT data, especially for sources where approximately sinusoidal 
modulation is expected. 

From the discussion above, in the case of uniformly-distributed phases, y/2/nipk asymp- 
totically follows the standard normal distribution. is thus, asymptotically, distributed 
as the sum of the squares of 2m standard normal variates. As is well known, the square 
of a normal variate is distributed as x?) i-e., chi-square with one degree of freedom, and 
the sum of 2m squares of independent, normally distributed variates is distributed as xim; 
chi-square with 2m degrees of freedom. Hence, if and only if these 2m ipk are statistically 
independent, then Z^ is asymptotically distributed as with 2m degrees of freedom. In 



127 



addition to this simple null distribution, the test is powerful [42], quick to compute, and has 
the already-discussed property of invariance under phase translation. It underpins much of 
the statistical framework we will develop, and so we take some time to assess its properties. 

7.3.1 Validity of Asymptotic Calibration 

First, there is the point of the asymptotic "independency" of the ipk- It is clear that, for 
only a few phases, these variables are highly dependent. Indeed, for a single observation, 
can be inverted to find find the original rv, (up to uncertainty from congruence modulo 1), 
and hence any other ipi^. However, for the uniform null distribution, this ability to infer the 
phase from the sum of sines and cosines breaks down with increasing sample size. E.g., if 
we have observed 10 phases and we have oq = 10, then we know that = for each phase. 
However, oq = 10 will almost surely not be observed, whereas oq ~ is quite likely, and 
there are many combinations of 10 phases that can yield it, i.e., it tells us very little about 
the constituent phases. As N increases, the tails are further suppressed, and consequently 
information about the {(f)} and hence the distribution of -01 conditioned on the value of ipj. 
It is in this sense and limit that the ipk become "independent". 

On the other hand, for some null distributions, it is impossible for the V'fc to become 
independent, even as N ^ oo. A clear albeit unrealistic example is f{(p) = d{(p ~ 0o)) in 
which case V'fc remain perfectly correlated. In a more realistic case, any peaking in phase 
of the null distribution will lead to long-surviving mutual depedence of the ipk- Thus, 
while knowledge of the null distribution will allow us to define ipk such that the marginal 
distributions are still asymptotically normal (i.e., the CLT applies), it will not be the case 
that the sum of the ^l^k becomes chi-squre distributed. More details can be found in Appendix 

m 

Related to the question of mutual independence of the V'fc is the rate of convergence to 
the asymptotic distribution. A Monte Carlo study of the converge as a function of both 
sample size and maximum harmonic (m) is shown in Figure [7.3.1[ Evidently, a sample size of 
about 50 phases are required for robust significance estimation at the 3a level. Convergence 
appears to improve for higher values of m. Interestingly, for low photon counts, low values 
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of m underestimate the significance, leading to an increased probability of making a Type 
II error. In contrast, high values of m overestimate the significance, increasing the chance 
of making a Type I error. 

7.3.2 Modification for Event-weighting 

A set of {(pi} will typically contain background events. As discussed above, for most high- 
energy instruments, background events make up a significant fraction of the data selected, 
and may even dominate the signal. Since the significance of a detection scales with the 
signal-to- noise ratio (SNR), we may try to apply stringent cuts to the data to try to enrich 
the signal. We don't a priori know the optimal cuts, so we must often choose between 
sub-optimal cuts or tuning the cuts with the data. The former dilutes the power of the test, 
while the latter increases the risk of a Type I error if we fail to re-calibrate the test statistic, 
typically with Monte Carlo. 

An alternate approach to enriching the SNR is event-weighting: we apply no stringent 
cuts to the data but instead assign each event a weight. Events associated with the source 
are weighted heavily, those with the background lightly. Below, we discuss a means of 
estimating the probability that the event in question was caused by detection of a source 
photon and we show that using these probabilities as weights is a powerful technique. 

To include the weights, we define the weighted empirical trigonometric moments. 



where vui is the weight assigned to each event. (Here, the c^i and s^i can either be derived 
directly from the phases or from the cumulative distribution in the case of non-uniformity 
as outlined in Appendix [Bj) The weights may themselves be random variables (if deter- 
mined from some model fit to the data), or they may be simple scalars. If the former, the 
distribution of the {wi} must possess finite mean and variance since w G [0, 1]. In the null 
hypothesis, the weights and the phases are independent rvs, so the moments of the products 
are simply the product of the moments, and it can be shown that the Lyupanov Condition 
for the CLT to hold will apply for any choice of and any such weights distribution. On 
the other hand, if the weights are taken as simple, non-random numbers, ipk is now simply 




1=1 



i=l 
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Figure 7.1: The observed distribution for the statistic for various sample sizes and a 
collection of maximum harmonics. The asymptotic calibration is shown as a solid line, while 
the solid band gives the empirical distribution function of the Monte Carlo realizations. The 
width indicates the statistical uncertainty estimated as y^A^(>= TS), i.e., the square root 
of the number of counts with a TS greater than or equal to the current TS. 
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a different linear combination of gk- The characteristic function for ipk becomes 



i=l 1=1 



1 



1- E-nv 



,i=l 



(7.13) 



2n 

For this to match on to the characteristic function of a normal distribution, i.e., for the 

n 

CLT to again apply, we must simply include '^1 iii ^ ^^w definition of ipk, i-e., 



"I 
1=1 



-1 

2 



lim V2^ iy^wf] iJk^ M. (7.14) 

n— >oo \ — ^ / 

\j=l / 

Practically, the distribution function for the weights is unknown, so we would replace the 
distribution moments with the sample moments, i.e., we would follow the above prescription 
regardless of how we interpret the weights. 

The weighted ipk are now asymptotically normally distributed. Their asymptotic mutual 
independence also remains, in the null the distribution of weights and phases are 

independent. Thus, the sum of the weighted % will again be X2fc- With this result, we 
define the weighted test statistic, Z'^^, 



-1 

2 m 



\j=l / k=l 

To verify the distribution, we generated a set of weights representative of those observed 
in actual data. Precisely, we drew a signal, Sj, from a xi distribution and a background, 6j, 
from a xic a'^d calculated Wi = Si/{si + bi). We incorporated the weights as in Eq. 17.151 
and performed many Monte Carlo trials. The results, showing perfect equivalence between 
and Z^^, appear in Figure. 17.3.21 

7.4 The H Test 

While the Z^ test is unbinned in phase we still must choose the number of harmonics 
to include in the Fourier decomposition. Choosing m too small will result in a loss of 
power against sharply-peaked light curves, while m too large will lose power against broad, 
sinusoidal light curves. To ameliorate this problem, de Jager et al. ([42|) proposed the H 
test, which seeks to estimate the optimal m from the data. They defined 

= max [Zf - c X {i - I)] ,1 < i < m (7.16) 
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400,000 Monte Carlo Trials 




TS = Z^/(2m) TS = Z^/(2m) 



Figure 7.2: The observed (Monte Carlo) and predicted (asymptotic) distribution for the 
weighted test statistic Z^^. The procedure is identical to that described for Figure E3T] save 
for the addition of weights and the omission of the m = 20 case for decreased computational 
burden. The addition of weights is shown to make no difference in agreement with the 
predicted distribution; only insufficient sample size can leads to a discrepancy. 
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and specifically recommended m = 20 and c = 4 as an omnibus test. They provided a 
Monte Carlo calibration of the tail probability and, in a recent paper flT], increased the 
Monte Carlo statistics, giving an estimate 1 — Fh2oW ~ exp(— 0.4 x h) good to /i ~ 70. 
We derive the analytic, asymptotic calibration for all values of m, c, and h in Appendix lAl 
and use this calibration anywhere a conversion from h to chance of Type I error (i.e., "cr") 
is needeclfl. 

In connection with the preceeding material, we note that, since the calibration of the 
H-test relies only on the asymptotic X2m calibration and independence of the underlying 
T^fc variables, any transformation leaving these properties unchanged (e.g. weighting) also 
leaves the and H test calibrations unchanged. Thus, we define a weighted H-test statistic 

Hmw = max \_Zf^ — c x (i — 1)] , 1 < i < m. (7.17) 

In the sequel, we adopt the original values for c and m unless otherwise noted, i.e., 

= max [Zj^ - 4 x (i - 1)] , 1 < i < 20 (7.18) 

and likewise for H (unweighted). 

7.5 Probability-weighted Statistics: Pulsars 

With the definitions and derivations above, we are now ready to implement probability- 
weighted statistics for pulsation searches. By probability- weighted, we mean we seek to 
weight each phase (or rather, cosine or sine thereof) with some number that expresses a 
genuine probability that the photon associated with the phase originated from the (possibly 
pulsed) source. In constructing this quantity, we must first choose between an instanta- 
neous or a time-averaged probability. Allowing for time dependence provides a more precise 
treatment of time- varying backgrounds, but it requires a complicated bookkeeping effort. 
However, for periodicity searches for periods on timescales comparable to or longer than the 
timescale for exposure variations, this bookkeeping is crucial. Time dependence also allows 
for phase-dependence, but we must assume a form for the light curve in order to calculate 
the weights, and this is more appropriate for a likelihood approach. 

^Throughout this section, we use a two-tailed convention when converting to "cr" units, e.g., 2a implies 
a chance probability of 5%. 
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Since we are concentrating on pulsars, we thus choose to compute the time-averaged 
probabihty. (See Appendix [B] for a treatment of searches for periods >> Is.) To define a 
time- aver aged probability, we refer to Eq. I4.10l and write down a shghtly generahzed version 
here, the expected rate from the jth source contributing to the ROI in an infinitesimal slice 
of observed energy and position 

rj{E,n)=T{E,X)e{E,no)Up,i{n;no,E). (7.19) 

Here, we recall that e{E, Qq) is the exposure integrated over the pointing history of the 
S/C, f„psi{(l;(loj E) is the incidence- angle- averaged PSF, and J^{E,X) gives the intrinsic 
flux density of the source. In this expression, E and Q are the measured energy and position 
for a particular photon; this expression is unbinned in all quantities save time. 

We now assume we have performed a likelihood analysis as outlined in Chapter [6] and 
have estimates for Aj, the parameters describing the spectral model for the jth source. 
Then, for a photon observed near E and O, the probability that it originated with the jth 
source is simply 

= J,' ' ' , (7.20) 
ZriiE,n,Xi) 

i=l 

where Ng is the number of sources contributing to the ROI. Wj clearly satisfies the require- 

Ns 

ments of a probability: Wj G [0, 1] and ^ = 1. 

i=l 

Having defined our approach, we can now make contact with previous efforts to improve 
periodicity searches. As we shall discuss in more detail in ^7.5.11 applying any statistical 
test to phases only with an omnibus extraction scheme — e.g., all photons within a fixed 
radius, or within a PSF-dependent cone — is not at all optimal. At least two methods for 
improving on this basic scheme have been attemped. Brown et al. proposed constructing 
"photon probability" maps [31], essentially kernel density estimators for the sky constructed 
in a fashion analagous to that outlined in ^6.31 By using these quantities in light curves 
constructed with COS-B data, they obtained an increase in the SNR ratio for the Crab 
pulsar. This method was profitably employed in searches for periodicity in EGRET data[72^ 

This scheme — essentially probability weighting with a weight depending only on the 
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reconstructed photon position — is similar to our method but does not adequately account 
for varying SNRs. To see this, consider searching for pulsations from a very bright (very 
dim) source. At ah energies of interest, the aperture over which the PSF is appreciably 
different from will have a background component, particularly in the Galactic plane. If, in 
a given energy band, the source is much brighter (dimmer) than the background component, 
we want to include (exclude) events lying the tails of the PSF. This "choice" is naturally 
made by using probability weights derived from a full spectral model in which the relative 
strength of sources enters directly into the weight calculation. 

A second approach was outlined by Ozel and Mayer-Hafielwander[71j who proposed a 
method for constructing an optimal aperture of arbitrary shape in position-energy space 
based on the expected counts, i.e., a tentative spectral model for the candidate pulsar folded 
through the IRF. By adopting an approximate spectral model, the authors effectively es- 
timated the SNR in each of the three energy bands they considered, overcoming partially 
the deficiency of PSF-weighting. However, their prescription was insensitive to light curve 
morphology, i.e., their algorithm returns the same aperture for a given position/spectrum, 
but the optimal aperture certainly depends on the light curve. As we shall show below, 
incorporating weights directly into the statistical test leads to a near-invariance in choice of 
aperture. 

We now proceed with a comparison of weighted and un- weighted versions of the and 
H test. 

7.5.1 Performance: Type 1 Errors 

The primary interest in constructing the weighted versions of pulsation search statistics 
(and the weights) is, of course, to craft tests that find more (real) pulsars. In statistical 
language, we want to minimize false positives (Type I Error) and false negatives (Type II 
error). 

Type I error stems from two sources. First, there is the chance of a fluctuation in the test 
statistic sufficiently large to pass our pre-set threshold for rejection of the null hypothesis, 
i.e., claiming detection of a pulsar. As long as we understand the null distribution of the test 
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statistic, this particular source of error is easy to control: we simply determine in advance 
our overall tolerance to false positives and set the TS threshold accordingly. We must be 
cautious about applying the aysmptotic calibration of the null distribution to small sample 
sizes; in such cases, it is always a good idea to verify the chance probability with a Monte 
Carlo simulation. 

A more insidious source is related to the nature of LAT data. As discussed in Chapter 
m the PSF is a strong function of energy, and at energies of a few GeV and below, there is 
strong source confusion, particularly in the Galactic plane. Simply going to higher energies 
for the better SNR is, however, not feasible for pulsar as the spectra are exponentially 
supressed above a few GeV. Thus, the sensitivity to pulsations can depend quite strongly on 
the particular selection criteria for the data, e.g., the energy-dependent extraction radius. 
Without knowing these cuts in advance, one is tempted to tune the cuts using the test 
statistic itself, a famous source of bias. To compensate, one should correct with a trials 
factor equal to the number of cuts tested. Alternatively, the use of omnibus cuts will result 
in a sub-optimal SNR, increasing the chance of a false negative. Finally, there is the issue of 
diminished sample size — and concomitant departure from the asymptotic calibration — with 
very stringent cuts. 

We expect that probability weighted statistics will be all but immune to the issue of 
aperture selection. High energy photons falling in the core of the PSF will naturally receive 
a weight close to 1, i.e., we include all photons that would be accepted by a stringent cut 
on SNEq- However, we also include a large set of photons from lower energies that receive 
modest weights but undoubtedly still contain pulsed signal. Finally, photons falling in the 
tails of the PSF at any energy receive a very low — essentially zero — weight. Unweighted, 
these low-energy, distant photons would swamp the signal. Weighted, they are neglibible. 
We can then with impunity select a single, large region of interest and be guaranteed good 
performance. Practically, no signal is gained beyond 2 to 3° save for the very strongest 
pulsars, which are of course already known; see Figure 17.5.11 Since we never tune the data 



^Since the weighted statistics are in some sense normahzed, the actual proximity to the maximum value 
of 1 is irrelevant. A more precise statement: "photons falling in the core of the PSF at high energies will 
have significantly higher than average weights". 
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selection, we incur no undue risk of Type 1 error. In this regard, probability weighting 
significantly increases the performance of the and H statistics. 

To make these claims concrete, we compare the weighted H test {H2Qw) and the standard 
H test {H20) over a grid of selections in photon position and energy. The method and results 
are presented in Figure !?. 5.11 Similar results in a single dimension — photons with energies 
above 200 MeV and a grid of position selection criteria — in terms of detection threshold 
flux (defined in the section below) appear in Figure (7. 5. 11 In both examples, the weighted 
statistics are largely insensitive to the extraction criterion. The result can be summarized 
as follows: unweighted statistics depend on SNR while weighted statistics depend primarily 
on signal alone, and the strategy is then to use a single, large aperture for all weighted 
pulsation tests. 

7.5.2 Performance: Type II errors (Sensitivity) 

We must also consider Type II error — false negatives. In astrophysical terms, this is directly 
related to the sensitivity of the method, i.e., the number of photons that must be collected 
before the test statistic passes (for some percent of an ensemble) the detection criterion. 

The easiest way to quantify the sensitivity is by simulation of an ensemble of pulsed 
sources. As for the validation of spectral analysis, we use the Science Tool gtobssim to 
simulate a point source with a diffuse background. For the discussion below, we use a 
realistic pulsar spectrum, in particular 

^ oc (E/GeV)-'' exp -E/E, (7.21) 
cLE 

with r = 1.57 and E^ = 3150GeV. (This spectrum is typical of many pulsars and is, in 
fact, close to the measured spectrum for the middle-aged Vela pulsar.) We place the source 
at the position of the Vela pulsar, (R.A., Decl.) = (128.8463, -45.1735). We choose this 
scenario since (a) many pulsars lie in the Galactic plane, in particular along the spiral arms 
as projected onto the sky and (b) detection in the Galactic plane is much more difficult than 
at high Galactic latitudes, so improved methods are particularly helpful here. As in Chapter 
[31 we simulate a very bright source {J- = J-5 = lO^^ph cm^^ s~^ ) and select subsets of the 
photons in order to achieve a target brightness. Unlike in the spectral analysis case, we use 
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Figure 7.3: A comparison of the dependence on event selection of H20W and H2o- A source 
with flux 10~^ ph cm~^ s~^ was simulated and a "Vela-like" light curve was added to the 
source photons. The spectrum was that of Eq. 17.211 The test statistics were calculated over 
a grid of selection criteria: the y-axis gives the extraction radius (maximum angular distance 
from the pulsar allowed) and the x-axis indicates the threshold energy, the minimum energy 
allowed in the selection. The lefthand (righthand) columns shows the results for the weighted 
(un-weighted) test statistic. The two rows are two indepedent Monte Carlo realizations of 
the source and background. The test statistics were converted to a units and independently 
normalized to the maximum observed value. It is clear that the weighted statistics have a 
minimal dependence on the extraction criterion and, more importantly, do best when given 
the most data (maximum in upper lefthand corner of cut space). On the other hand, the 
significance can be strongly peaked for the standard statistic and the optimal cut varies 
from realization to realization. 
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Figure 7.4: A comparison of the weighted and un-weighted versions of three statistics. 
Photons with energies above 200 MeV are selected according to the extraction radius on 
the X-axis. The light curve in this case is a Gaussian with a = 0.03, so the H test and high- 
harmonic Z test are much more sensitive than the Z2 tests. The salient feature is the relative 
independence of the detection flux threshold on extraction radius for the weighted methods. 
The sensitivity for the standard tests drops by about 50% as we increase the aperture size. 
The results indicate that little is gained for weighted methods with an extraction radius 
larger than 2°. 
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a single realization of the diffuse background. This is not a serious impediment since we 
can randomize the phase of the diffuse photons for each MC realization. Using the same 
spectral models used to simulate the sources, we calculate the weights using gtsrcprob, a 
Science Tool designed to implement the probability- weighting scheme. The base weights 
must be modified to account for changing point source flux; we address this point below. 

When the Monte Carlo events arc generated by gtobssim, the simulated event times have 
no inherent pulsation. In order to convert "unpulsed" photons to a "pulsed" data set, we 
apply the following procedure: 

1. Select a template for the light curve of the pulsar. This will typically be a sum 
of (wrapped) Gaussian functions, and will be normalized, i.e., a probability density 
function for photon phase. 

2. Select photons originating with the pulsed source. This is possible because each event 
generated by gtobssim is tagged with a unique identification for the generating source. 

3. For each selected photon, draw (by Monte Carlo) a phase from the light curve and 
assign it to the photon. 

For background photons, we simply assign a phase drawn from the uniform distribution. 
In this prescription, there is no sense of a time-to-phase mapping. If we wish to use an 
explicit ephemeris, e.g., specifying a reference time, a period, and a period derivative, then 
this timing solution predicts the phase as a function of time. We then must perform the 
following additional steps to make the simulated times consistent with the simulated phases: 

1. For the simulated time, use the time-to-phase mapping to calculate the phase predicted 
by the timing solution. 

2. Use the timing solution to determine the pulsar period at the simulation time. 



3. Adjust the simulated time hy St = P{t) x {^sim — 0mod) niodl, with (f>sim the phase 
drawn from the light curve and 4'mod the phase predicted by the timing solution, i.e.. 
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the smallest possible adjustment that will align the simulated time with the desired 
light curve. Provided P{t) is small enough, the change in exposure is negligible. 

After applying this procedure, one can use the timing solution to convert the adjusted 
simulated times to phase, and these phases will follow the light curve distribution. These 
steps are not necessary for testing pulsed sensitivity and we include them here only for 
completeness. 

With these procedures, we have in hand an ensemble of pulsars at a series of intrinsic 
fluxes with arbitrary light curves. To test the sensitivity, we determine the flux threshold 
for each method given a particular LC morphology, i.e., the flux for which a source would 
be detected according to some criteria (see below) a majority of the time. More specifically, 
we 

1. At a sequence of intrinsic fluxes, evaluate the test statistic for each method for each 
member of the ensemble. At each flux, we must recalculate the probability weights. 
These were originally assigned with the simulation flux values, J-sim, and we now 
wish to described a source with target flux J^tar- The new weights are given by 

2. Using the asymptotic calibration, calculate the chance probability of Type I error (the 
tail probability in the asymptotic distribution) and convert it to a units, i.e., a value 
of Xa is the two-sided tail probability of a normal distribution integrated from X to 
oo. 

3. Determine the flux for which 68% of the ensemble deliver a a value above some pre- 
determined threshold (see below) . This is the detection or flux threshold. To determine 
the 68% level with some level of robustness, we flt the ensemble values with a normal 
distribution and report the appropriate quantile value. 

To determine the flux threshold, we need to invert the calculated quantity (tail probability 
in a units) to the desired quantity (flux threshold). Fortunately, the relationship between 
the two is linear. This dependence may be slightly surprising. Significance canonically scales 
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Figure 7.5: The dependence of significance in a units as a function of flux. The reported 
value at a given flux is that attained by at least 68% of the ensemble of 50 MC realizations. 
The trend is approximately linear, as explained in the main text, making it simple to invert 
the relation. The light curve here is a single Gaussian peak with a = 0.03. 



as \/iV, the collected events, or in this case, since we integrate for exactly one year, \/^, the 
square root of the source flux. However, when we increase the source flux without increasing 
the background flux, we also increase the SNR, and in general significance is proportional 
to the square root of SNR. Taken together, the dependence is linear, i.e., a (x J-. This 
dependence for a particular source is shown in Figure I7.5.2[ 

Above, we mentioned a pre-determined threshold for the tail probability in a units: test 
statistics above this threshold cause us to discard the null hypothesis, i.e., claim detection 
of a pulsed source. In the following, we choose 4(T for as the confidence level for detection. 
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It provides a very small probability of Type I error, about 1 in 15800, and it is also not too 
different from the LAT Collaboration's internal threshold of 5a. On the other hand, as we 
have seen, sample sizes of order 100 (typical for a faint pulsar) begin to depart from their 
asymptotic calibrations at around this confidence level. 

We are now ready to characterize the effect of adding probability-based weights to ex- 
isting statistics on sensitivity. To give a modest survey of the statistics outlined above, 
we present weighted and un- weighted versions of H20, ZI2, and Zf. Recall that H20 is an 
"omnibus" test that depends little on light curve morphology, whereas we expect to 
perform well for light curves with sharp peaks and Z2 to perform well for light curves with 
broad features. 

The primary result, in Figure 17.5.21 shows the flux threshold for each method as a 
function of "duty cycle" , the fraction of the full phase for which there is appreciable pulsed 
emission. In this case, the light curves are single Gaussian peaks with a variety of values for 
their a parameter; the templates are shown in Figure !?. 5. 2[ For a fixed flux, increasing the 
duty cycle decreases the peak flux, or SNR, and the primary dependence is then an inverse 
relation between flux threshold and duty cycle. However, there is additional dependence 
from the nature of each test. It is clear, e.g., that the H20 and perform significantly 
better than Z2 for low duty cycle sources, while Zl maintains a slight edge for broad light 
curves. The H test does a good job for all duty cycles. 

More importantly, it is clear that, independent of pulsar duty cycle (unsurprisingly), 
the weighted statistics enjoy about a factor of two smaller threshold for detection than 
their unweighted counterparts. This has nontrivial implications for the detected pulsar 
population. E.g., if the pulsars follow a linear logN-logS relation with slope a, using weighted 
statistics yields w 2° additional detections. 

Two-peaked light curves 

Many pulsar light curves illustrate twin peaks, often separated by about 0.4-0.5 cycles. 
We repeat the analysis of the previous section using a template comprising two Gaussian 
peaks separated by 0.45 in phase. Real pulsar peaks often have unequal intensities (with an 
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Figure 7.6: The 68% flux detection threshold based on a 4o" detection criterion for a single- 
peaked Gaussian light curve as a function of duty cycle. Light curves with narrow (broad) 
peaks lie to the left (right). The un- weighted thresholds are about twice those of the 
weighted thresholds, indepedent of duty cycle or statistic. 
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Figure 7.7: The templates used for the single- and double-peaked light curves in the deter- 
mination of the detection flux thresholds. The functional form is a wrapped Gaussian, i.e., 

oo 

/((/,) = ^ ^(0 + i) with g{(p,n,a) = {2tt)~'^-^ exp(-O.5(0 - f^f/a^)- In the first panel, 

i=oo 

/i = 0.5, while in the second panel, fii = 0.25, and ij,2 = 0.70. In this panel, the ratio of the 
peak heights is 3/2, and the peak widths are identical. 
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Figure 7.8: The 68% flux detection threshold based on a 4(j detection criterion for a double- 
peaked Gaussian light curve as a function of duty cycle. The flux threshold for the weighted 
statistic is 1.5 — 2.0 times lower than the unweighted statistics. 



energy-dependent ratio.) We reflect that here with a slightly-dominant leading peak. The 
templates are shown in Figure 17.5.21 As seen in Figure 17.5.21 the overall flux thresholds 
are unsurprisingly increased: at a flxed flux, spreading photons between multiple peaks 
decreases the overall SNR. The weighted statistics maintain a comfortably decreased flux 
threshold. 
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Pulsars with DC Emission Components 

Some pulsars (not respecting their nomenclaturqj) emit an appreciable portion of their flux 
as unpulsed 7 rays, e.g. PSR J1836+5925|13j. (Models of magnetospheric emission gen- 
erally allow for a sustained GeV component under certain geometries and viewing angles.) 
Unpulsed emission could be a confounding factor for the weighted test, since unpulsed pho- 
tons from the source will carry heavy weight but be uniformly distributed, decreasing the 
effective SNR. And indeed, in Figure 17.5.21 where we have used a single-peaked light curve 
with a pulsed fraction of 1/2, we see (a) an overall increased flux threshold on the order 
of two, as expected (b) a slight increase in the ratio of weighted-to-unweighted detection 
thresholds. However, the ratio is 1.5 to 2.0, indicating the weighted statistics still offer 
significantly improved sensitivity for pulsars with appreciable DC emission. 

Effect of Uncertainties in Spectral Parameters 

From these demonstrations, it is clear that the probability-weighted statistics have, on 
average, a factor of about 2 improved sensitivity relative to the unweighted versions. One 
potential objection is that, in determining the weights for this validation, we have used 
the known spectrum. With real data, of course, we must first estimate the spectrum in 
order to determine the weights. To assess the impact of using estimated parameters with 
concomitant uncertainty, we make use of the Monte Carlo ensembles developed for the 
validation of spectral analysis and described in Chapter [5l 

For this application, we simulated 20 realizations of a point source at the position of the 
Vela pulsar with the same power law with exponential cutoff spectrum as employed above, 
Eq. I7.21[ To this data we added phase from a single-Gaussian light curve with a = 0.03. 
From Figure [7.5.21 we see that the detection threshold for such a configuration is estimated 
at ~ 8 X lO^^ph cm^^ s~^ . Thus, following the procedure outlined in Chapter [S] for 
constructing ensembles of point sources with a particular flux, we generate 20 realizations 
of this "pulsar" with a flux of 8 x 10~^ph cm^^ s^^ 0. First, we calculate the probability 

^Given the original notion of pulsar as a "pulsating source of radio", with the advent of "radio-quiet 
pulsars" this ship may already have sailed. 

® At this flux, there are of order 100 source photons for the 1-year integration period, allowing for asymp- 
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Figure 7.9: The 68% flux detection threshold based on a 4o" detection criterion for a single- 
peaked Gaussian light curve as a function of duty cycle. Here, the pulsed fraction has been 
decreased to 50%, i.e., half of the photons from the source are uniformly distributed and 
half are drawn from the Gaussian light curve. The flux threshold for the weighted statistic 
is 1.5 — 2.0 times lower than the unweighted statistics, with some slight dependence on duty 
cycle and statistic. 
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weights using the known model parameters for the pulsar and the diffuse background, i.e., 
the "ideal" case. We then perform a ML spectral fit with pointlike to estimate the spectral 
parameters. Since the simulated point source is very dim relative to the background, it 
is inappropriate to attempt to fit a spectral model with three degrees of freedom. We 
therefore fix the cutoff energy to 100 GeV, essentially a power law spectrum. This approach 
is conservative since we are using an incorrect spectral model. Using the best-fit values 
for the flux density and the photon index, we calculate a new set of probability weights. 
Finally, we use a weighted H test to determine the signiflcance (a) using the "ideal" weights 
and (b) using the "measured" weights. 

We compare the results — in a units — in Figure 17.5.21 In general, the detection signifi- 
cance obtained with the "measured" weights is slightly lower than that obtained with the 
"ideal" weights, although in a few cases statistical fluctuations lead to the opposite outcome. 
Comparing the populations means, the the overall significance using measured weights is 
decreased to 92% of the ideal case. Recalling that the fiux threshold depends linearly on 
the significance, we estimate the flux threshold is increased in the case of measured weights 
by ~ 10%. This effect is relatively small compared to the factor of 1.5 — 2.0 increase in 
fiux threshold seen between the weighted and unweighted versions of the H test. We there- 
fore conclude that — even accounting for uncertainties in the spectral parameters used to 
calculate the probability weights — weighted statistics offer a significance improvement in 
sensitivity. 

Comparison of Pulsed and Unpulsed Detection Thresholds 

The machinery established above — performing spectral fits on an ensemble to compute the 
weighted statistics using probabilities estimated from an ML fit — also provides for directly 
comparing the DC (unpulsed) source significance with the pulsed significance. Recall from 
the discussion in Chapter [6] that there is an asymptotic calibration for the test statistic 
obtained from the likelihood ratio for a single parameter whose null value lies on a boundary. 
In that case and here the parameter was the fiux of a point source which is zero in the null 



totic calibration. 
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Figure 7.10: The significance for each member of the ensemble described in the text calcu- 
lated using weights derived from the model used to simulated the data (the Monte Carlo 
"truth") and derived from the ML model. The ML-derived values are very similar to those 
obtained using the known spectrum. 
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case. To apply this calibration, we proceed as in Chapter [6] and fix all parameters but the 
flux. We set the cutoff energy to 100 GeV as above, and we set the photon index to F = 2.0. 

Next, we perform a ML fit on the ensemble of sources described in the previous section 
to determine the best-fit value for the flux density. The DC significance is then determined 
by 



CJDC = ^2^\ogCopt/Co, (7.22) 

with Copt the likelihood value obtained with the best-fit fiux density and Cq the likelihood 
value obtained with the flux density set to zero. 

As in the previous section, we calculate the probability weights using both the Monte 
Carlo truth values of the parameter and with the best-flt spectrum — in this case, with cutoff 
energy and photon index fixed — to estimate a pulsed significance with the H2Qw test. 

We compare the unpulsed and pulsed significances in Figure [7.5.21 The vast majority of 
ensemble members are detected more significantly through pulsations than through unpulsed 
emission. The measured significance for pulsed detections is a factor of 2.2 greater than for 
DC detection. If we assume that the detection fiux threshold is linear in aoc as it is in the 
pulsed significance, this means we require sources to be about twice as bright on average to 
detection them through DC emission rather than pulsed emissiorfj. It is amusing that this 
is about the same factor we observe when comparing weighted and unweighted statistics. 

To determine how fitting the spectra with only one degree of freedom affects the perfor- 
mance of the weighted statistics, we compare in Figure [7. 5. 2 1 the significance computed from 
the "ideal" weights and from the weights determined from the best-fit spectrum (recalling 
that the photon index and cutoff energy are fixed). Interestingly, the "measured" weights 
deliver results comparable to those calculated in §7.5.21 indicating an insensitivity to the 
overall spectral shape. 

7.6 Calculating Pulsed Sensitivity 

One of the primary observables in constraining pulsar emission mechanisms is the overlap 
between radio-loud and 7-ray bright pulsars. An important extension of this is placing the 

^This result, of course, depends strongly on the light curve morphology. 



151 




Figure 7.11: A comparison of the significance of pulsed detection versus unpulsed detection. 
The pulsed detection significance was calculated using the ML best-fit spectrum with the 
weighted H test, while the unpulsed significance was derived from a likelihood ratio test as 
described in the text. For nearly all sources, the pulsations are more strongly detected than 
the DC emission. 
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Figure 7.12: As Figure [7.5.2l but with weights derived from a ML fit with the pulsar spectrum 
fixed to a power law with T = 2.0 and only the flux allowed to vary. 
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strongest constraints possible on the 7-ray flux from radio-loud pulsars. In the material 
above, we saw that weighted statistics outperform both unweighted versions of the same 
statistics and unpulsed significance tests. Thus, pulsed upper limits are a natural candidate 
for constraining the 7-ray emission. 

Although much work has been done on extracting analytic pulsed upper limits (see |40) 
for an enlightening discussion), the weighted method is not particularly amenable to this 
approach. Each source comes with its own set of probability weights that depends strongly 
on the pulsar spectrum and its position on the sky. Thus, while we can extract analytic 
upper limits once we know the weight distribution, by calculating the weights we have 
already done most of the work! The machinery we developed above for determining flux 
detection thresholds is precisely what is needed for an upper limit/sensitivity. 

To determine the sensitivity of Fermi-LAT to pulsed detections, we essentially repeat 
the procedure outlined in the previous section, viz. 

1. Simulate an ensemble of point sources with some spectral shape. 

2. Add phase from, e.g., a single-peaked or a double-peaked light curve. 

3. Select a significance level for detection, e.g., the 4o" criterion adopted in our analyses. 

4. Determine the fraction of the ensemble that must exceed the detection significance to 
claim a detectable flux. (We used 68%.) 

5. Calculate the flux detection threshold as outlined above. 

This flux detection threshold is the desired sensitivity, i.e., the flux level at which we can 
typically (68% of the time) detect the source with significance exceeding our threshold (4cr.) 
Sensitivity maps giving the pulsed fiux detection threshold as a function of position on the 
sky can be generated by repeating this exercise over a grid of positions. 
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7.7 Summary 

We motivated the use of unbinned, Fourier-based statistics for the detection of pulsations 
and we outhned some of their properties. We showed that they could be modified to in- 
corporate probability weights while retaining their calibration. Appealing to gains made 
in using likelihood — i.e., knowledge of the IRF — we proposed that likelihood-derived prob- 
ability weights would offer improvement over statistics formulated with counts alone. We 
defined the time-averaged probability that a given photon comes from a particular source 
in terms of the ML spectral model for the source and its background. We found that the 
weighted versions of the and H test had very little dependence on the particular data 
selection used in contradistinction to the unweighted versions (Figure 17.5. ip . By evaluating 
test statistics for ensembles of simulated pulsars, we estimated the detection flux threshold 
for a given statistical test and showed that the flux threshold for weighted tests was a factor 
of 1.5 — 2 lower than the unweighted test (Figures 17.5.21 and 17.5.21) . Finally, we demon- 
strated that these improvements depended very little on the fraction of pulsed emission 
(Figure [7.5.2p or on exact knowledge of the spectrum (Figures 17.5.21 and I7.5.2p . Addition- 
ally, we found that pulsed sensitivity is about twice that of the unpulsed sensitivity (Figure 
17.5. 2p for a fully-pulsed source. 

Taken together, these results represent a significant gain in the search for periodic signals 
from pulsars and suggest the adoption of weighted statistics — in particular, H20W — as an 
omnibus test for pulsation. 
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Appendix A 

THE ASYMPTOTIC NULL DISTRIBUTION OF Hm 

Recall the statistic is defined as 

Hm = max[Zj? — c{i — 1)] = max[Xj], 1 < i < m. (A.l) 

In its original formulation, m = 20 and c = 4. The addition of the constant term c{i — 1) 
suppresses contributions from the higher harmonics in the null case. (If c = 0, the mth 
value is always the maximum.) 

From the above definition, Hm is an (extreme) order statistic of the Xj, a collection of 
m dependent, non-identically distributed RVs. While the distribution of such a statistic is 
often difficult or impossible to obtain in a useful form, in this case, the relatively simple 
mutual dependence of the variables (they satisfy the Markov property), together with the 
form of the conditional distributions (exponential) , admits a concise, closed form solution 
for the asymptotic null distribution of Hj^. 

A.l The Joint Probability Density Function of X 

Let Xi = Zf — c{i — 1). (For convenience, we interpret Zq as 0, giving Xq = c.) If we 
assume the asymptotic distribution for Z^, then Xj+i can be obtained from Xi by adding 
a X2 distributed variable and subtracting c. That is, 

fxi+i\Xi{xi+i\xi) = xli^i+i + c). (A.2) 

Now, let Xm be a random vector in R"*, such that Hm is the maximum element of Xm- 
We construct the joint pdf for X as a product of conditional distributions: 

fxru^^rn) =/x^|X^_i(^m|^m-l) X jx^.^ (^^m-l |^m-2) X ••• 

xfx2\Xi{x2\xi) X fxiixi). 
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Prom Eq. IA.21 this reduces to 

m 

fxSXra) = Wxl{Xi-Xi-i + c), (A.3) 
i=l 

demonstrating the Markov property. From this form, it is easy to show that the marginal 
distribution for Xi is xii by reducing the integral over the remaining m — 1 terms to a series 
of convohitions. The normahzation may also be verified in this way. 

Inserting the explicit form for xK^) ~ \ 6xp(— |) 6{x), where 9{x) is the Heaviside step 
function restricing support to positive arguments, yields a significant simplification: 

.,m— 1 



fx \Xm) — 



2 



m 



Jj6'(xj - Xi-i + c) 



exp(-^), (A.4) 



Li=l 

where we have defined for convenience a = ^ exp (— §)• 
A. 2 The Cumulative Distribution Function of Hm 

Hm is just the maximum element of the vector Xm- Thus, the probability to observe a 
value less than or equal to hm is simply the integral of the f^ix) over all values of x with 
all elements of x less than or equal to hm- That is, 

hri 



FHrnihm) = lli^J_ dx.jf^iX) 



Inserting the form obtained in Eq. IA.4I yields 



Fh^ (hm) = -^Z— / dXiO {Xi - Xi-l + c) 

«_i L-'-oo 



„m-l r r^m , rr 

exp (-^j . (A.5) 

1=1 '-■'-^ 

The two important features of the joint pdf — the simple relation between the dependent 
variables and the simple form of the conditional distributions — yields an integral expression 
for -Fff^ in which the integrand consists of an exponential in a single variable. The main 
difficulty lies in determining the support of the integrand specified by the product of step 
functions. 

A. 2.1 Reduction of FH^{hm) 

We can develop the integral in Eq. |A]5] recursively. First, we make a change of variables in 
the rightmost integral: Um = Xm — Xm-i + c. This integral is then 
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L 



h-Xm-l+C 



dum exp 



-^m— 1 "I" ^ 



a 



exp 



2^m— 1 



exp 



The lefthand term, togther with the remaining integrals, is the cumulative distribution 
function for an m — 1 harmonic H-test, Fh^_i- The righthand is the integral of unity with 
nontrivial limits of integration. That is, 



where 



FH^ihm) = FH^_,{hm) " a""^ X exp { ) 



" r /-/j 

-^n(^) = / dXi9{Xi-Xi-.i + c) 

.■1 J —OD 



1=1 



Fully reducing Fh^ yields a power series in a: 



hr, 



FHmihm) = 1 - exp ( ^) ^Yl «"-fn(/im)- 



m—1 



n=0 



Thus, the only remaining task is to evaluate /„. 



(A.6) 



(A.7) 



A. 2. 2 Evaluation of In 



We begin with a change of variables to eliminate the step functions. Let Ui = Xi — Yl]=i{^ 



c). Then 



i=i J 



Here, Bi = h+ {i — l)c — X]j=i > '^^ ^o^e that Bn = -Bn-i + c — . We can evaluate 
the n integrals recursively. With each integration, we make the change of integration variable 
to Qi = Bi + {n — i + l)c — Ui- For instance, evaluating the rightmost integral, we have 

dUi Bn 



uh) = n ( / 

n-2 . 



du, 



dUn-l Bn-1 + C - Un-l 



Vn-2 . B, 



B„-i+2c 



dqn-1 (qn-l - c) 



2c 



B„_i+2c 



dqn-1 q-n- 



2c 



Cln- 



1- 
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This form is typical as one continues to integrate. The change of variable always produces 
a monomial in the integration variable. The upper boundary then produces a term in the 
integration variable of the next integral while the lower boundary produces a monomial of 
c. This separation allows a recursive development for and combining the recursive terms 
yields 

We note that /o(^) = 1 a-nd Ii{h) = h. The use of logarithms facilitates the numerical 
evaluation of this expressions for large h and n. 

A. 2. 3 Monte Carlo Validation 

To check our results, we performed Monte Carlo simulations of the H20 statistic in the 
asymptotic null case. Specifically, for each realization of H20, we drew 20 realizations from 
a distribution with one degree of freedom and determined H accordingly (with c = 4.) 
In Figure IA.2.31 we show the results of 10^ Monte Carlo trials for a variety of maximum 
harmonics, viz. Hi, H2, H^, Hg, and H2o. The results are in good agreement with the 
asymptotic distribution derived here. Next, we focus on H20 with lO^'' trials, shown in 
Figure IA.2.31 These results are also in good agreement. 

A. 2.4 Behavior at Large H 

Finally, we consider the behavior of the null distribution at large values of H2o- We have 
seen that exp{—OAH) is an excellent approximation for the true survival function for any 
"practical" applicatiorj^ . Nonetheless, it is interesting to consider very large values of H to 
investigate the trend of the true distribution relative to the simple exponential approxima- 
tion. We show the ratio of the two expression in Figure lA. 2. 41 The exponential first-order in 
H becomes a poor approximation after H ~ 100 (again, sufficiently large for any practical 
need!). By performing a least squares fit to minimize the difference between the asymptotic 
distribution and an approximation of the form exp(— ai? — bH^), we obtain an expression 

^Finite sample size effects are almost certainly larger than the discprepancy between the approximation 
and the asymptotic distribution once H20 is of order 50. 
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Figure A.l: We show the survival function (1 — F(H)) of the asymptotic distribution for 
H and for a sample distribution (A^ = 10^) drawn from the null distribution by simulation. 
To reduce the scale, we divide the survival function for a variable with two degrees of 
freedom, viz. 1 — F{x) =exp(— 0.5x). For each maximum harmonic, the sample distributions 
agree with the asymptotic calibration. 
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Figure A. 2: We show the survival function (1 — F{H)) of the asymptotic distribution for H 
and for a sample distribution (A'^ = 10^^) drawn from the null distribution by simulation. 
To reduce the scale, we divide the survival function by the "first-order" approximation 1 — 
F{H) ~ exp(— 0.4ffl[4Tj. This approximation gives excellent results (< 5% deviation from 
the true distribution) to i7 = 50 and above. We have marked the values H corresponding 
to (two-sided) significance levels of 3cj, 4cj, and Scr for reference. 
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Figure A. 3: The ratio of the true asymptotic distribution for H20 to the approximate 
expression exp(—0. 4/^20)- At H20 ~ 40, the survival function of the asymptotic distribution 
begins to decrease more rapidly than exponential approximation, indicating the importance 
of a quadratic term. We show such a term, determined by least squares, with the red curve, 
which provides an accurate estimation of the survival function over a broad range. 



accurate to 10% over the range < H < 300, 

1 - F{H) « exp(-0.39205i? - O.OOOlOTli?^). (A.9) 
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Appendix B 

PULSATION SEARCHES WITH NON-UNIFORM NULL 

DISTRIBUTIONS 



As discussed above in Chapter [3U, when the null distribution of the phases departs from 
uniformity, the V'fc can retain independence even for large sample sizes. Below, we outline 
two approaches to partially overcome this deficiency. 

First, if the departure from non- uniformity is in some sense small, we can proceed as 
before. We must modify the definition of the ipk to account for the non-uniformity so that 
they are once again asymptotically normally distributed. We begin by explicitly calculating 
the null distribution F^{(j)), based on the IRF and S/C pointing history. Essentially, ~ 
e{(j)), i.e., the null distribution is roughly the exposure calculated as a function of phase. The 
approximation comes in choosing a shape for the (typically unknown) source spectrum used 
as a weight when summing the exposure over energy, and in the form of time-dependent 
background leakage. Provided we are able to adequately characterize the IRF and model 
the time-dependent background, we can calculate the null distribution F^{(j)) and thus the 
moments of the Gk- Then, 

V^ii'k - I^G,)/(TG,) ~ AA, (B.l) 

as before. The moments are somewhat cumbersome to calculate as the distributions of Gk 
are now not monotonic. For an arbitrary phase distribution F$ ((/>), the distribution of Ck is 
given by 

k i-{2i-l-5{x))/2k 
J{2i~l+5{x))/2k 

with 5 = 1 — arccos (x)/7r. The distribution for may be obtained from Eq. IB.2l bv letting 
i —7- i + 1/4. These distribution functions are in turn used to calculate the required moments 
and a. 

^We adopt the notation of Chapter [7] here. 
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Now, provided that the departure from uniformity is mild, we expect the same argu- 
ments about the asymptotic dependence of ipk to apply, at least approximately, and a 
statistic constructed using these new ipk will follow, again approximately, a Xm distribution. 
However, the risk of overestimating the significance of a detection due departure from the 
Xm calibration leads us to prefer a second method. 

In a second approach, we again calculate the null distribution -F$(</>) explicitly from 
the exposure. However, we now switch our fundamental set of rvs from {(pi} to {F^{(l)i)}, 
i.e., the cdf evaluated at the observed phases. If we have calculated correctly, then 

the {F^((j)i)} are guaranteed to follow a uniform distribution in the null case. Further, 
F^{(l)i) G [0,1), so we can use the directly in the (weighted and unweighted) 

tests we have already developed. The drawback to this approach is a distortion of the 
input signal by "processing" by F$((/)j). Even worse, this processing depends on absolute 
phase, i.e., we lose the invariance under translations on the circle that has been a property 
of the statistics we have outlinec|^. However, we should not be surprised by this. The 
departure from uniform exposure inevitably increases our sensitivity to some signals and 
decreases it for others. A few examples appear in Figure [B] where we have taken some simple 
analytic distributions — a uniform distribution, a decaying exponential^, and a sinusoid — 
for the exposure and processed a sharp signal — represented by a von Mises distribution. 
We see that, e.g., when the signal is aligned with the peak of the sinusoidal exposure, the 
processed signal is broadened, whereas if it is near the minimum of the exposure, the signal 
is sharpened. (This gain is offset by the appreciable decrease in the number of collected 
photons.) 



^To be clear — the test statistic itself is still independent of absolute phase, but the processed signal F,i>{(f>) 
does depend on the absolute phase 

^An exponential distibution is a somewhat pathological example since it is discontinuous. 
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Figure B.l: The resulting "signal" after processing an input with F^{(l)i) as outlined in the 
text. We simulate 10^ realizations from a von Mises distribution with a sharp peak and then 
generate histograms of F^{<pi) for each of three distributions. The uniform distribution is 
trivial and represents the unprocessed signal. A sinusoidal distribution might arise, e.g., if 
the period of interest is very close to twice the orbital period of the S/C. An exponential 
distribution is difficult to produce physically but provides insight into asymmetric distri- 
butions. In the upper left panel, we plot /<i>(0) for each of the exposure scenarios. In the 
remaining three panels, we show the processed signal for three different absolute phases of 
the signal. Clockwise from upper right, the signal is centered (p = 0.5, 0.8, and 0.2. 



